<a href="https://colab.research.google.com/github/shaoyinguo-portfolio/CorpGenie-exp/blob/main/Meeting2TechDoc.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook demos the tech document generation given key frames and transcripts, using multi-modal models hosted by Open Router.

The access tokens are handled by Colab Secret Manager. Please sign up to Open Router and create your own, name it `CorpGenie`

## Key Steps:

1. Load key frames and transcript lines, combine and sort by timestamp
2. Break into chunks of certain size.
3. Recursively feed into LLM with previously generated text
4. Save the text for retrieval later

Note: make sure to have `OPENROUTER_API_KEY` setup in the Colab Secrets manager

In [1]:
!pip install -U -q langchain
!pip install -U -q "langchain[openai]"

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/93.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m93.8/93.8 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/471.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m471.0/471.2 kB[0m [31m39.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m471.2/471.2 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m156.8/156.8 kB[0m [31m7.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.2/46.2 kB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.8/56.8 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
from google.colab import drive, userdata
from matplotlib import pyplot as plt
import numpy as np
from pathlib import Path
from time import time
from PIL import Image
import base64
import io
import glob
import os
import json



try:
    import gdown
except:
    !pip install gdown
    import gdown

from langchain_openai import ChatOpenAI
from langchain.messages import HumanMessage, SystemMessage
from langchain_core.output_parsers import StrOutputParser

In [3]:
drive.mount('/content/drive')
data_path = Path('/content/drive/MyDrive/Colab Notebooks/data')

TRANSCRIPT_PATH = f'{data_path}/transcripts.txt'
KEYFRAME_PATH = f'{data_path}/key_frames'

os.environ["OPENAI_API_KEY"] = userdata.get('OPENROUTER_API_KEY')

Mounted at /content/drive


In [4]:
# Parse transcripts text file by timestamps:

all_lines = []

with open(TRANSCRIPT_PATH, 'r') as f:
    for line in f.readlines():
        # print(line)
        splits = line.split(']')
        if len(splits) < 2:
            continue
        ts = float(splits[0].strip().replace('[', ''))
        text = splits[1].strip()
        all_lines.append((ts, text))

print(f'Found {len(all_lines)} lines')

Found 731 lines


In [5]:
# Load key frame file paths:

all_images = []
for p in glob.glob('/content/drive/MyDrive/Colab Notebooks/data/key_frames/*.jpg'):
    all_images.append((float(Path(p).name.replace('.jpg', '')), p))

print(f'Found {len(all_images)} images')
# all_images.sort(key=lambda x: x[0])

Found 104 images


In [6]:
# Combine and sort by timestamps

all_events = []
for ts, img in all_images:
    # Convert numpy array to PIL Image as required by the processor
    all_events.append({'timestamp': ts, 'type': 'image', 'data': img})

for ts, text in all_lines:
    all_events.append({'timestamp': ts, 'type': 'transcript', 'data': text})

# Sort events chronologically
all_events.sort(key=lambda x: x['timestamp'])

In [7]:
def encode_image_to_base64(image_path: str, format: str = 'PNG') -> str:
    """
    Reads an image file using PIL, saves it into an in-memory buffer,
    and encodes the buffer contents into a Base64 string.

    Args:
        image_path: The file path to the image.
        format: The format to use for the buffer (PNG is recommended for slides).
                Must be a format PIL supports.

    Returns:
        A Base64 encoded string of the image data.
    """
    try:
        # 1. Read the image file using PIL
        with Image.open(image_path) as img:
            # Ensure image is in RGB format if it's grayscale or otherwise different
            if img.mode != 'RGB':
                img = img.convert('RGB')

            # 2. Save the image data to an in-memory buffer (BytesIO)
            # This avoids writing a temporary file
            buffered = io.BytesIO()
            img.save(buffered, format=format)

            # 3. Encode the bytes from the buffer to Base64
            img_bytes = buffered.getvalue()
            img_base64 = base64.b64encode(img_bytes)

            # 4. Convert bytes to a string for the API payload
            return img_base64.decode('utf-8')

    except FileNotFoundError:
        print(f"Error: The file was not found at {image_path}")
        return None
    except Exception as e:
        print(f"An error occurred: {e}")
        return None

In [8]:
def check_content_blocks_size(content_blocks):
    json_payload_str = json.dumps(content_blocks)
    return len(json_payload_str.encode('utf-8'))

In [9]:
def yield_messages(events, max_size=1e6):
    content_blocks = []

    for i, item in enumerate(events):
        time_str = f"{item['timestamp']:.2f}" # use seconds for correlation with image names

        if item['type'] == 'image':
            # only break at images for coherence
            if check_content_blocks_size(content_blocks) > max_size:
                yield content_blocks
                content_blocks = []
            if events[i+1]['type'] == 'image':
                # skip continuous key frames for videos and quick scrolls
                print(f'Skipping continuous key frame {time_str}')
                continue
            base64_image = encode_image_to_base64(item['data'])
            content_blocks.append({
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/png;base64,{base64_image}"
                }
            })
            content_blocks.append({
                "type": "text",
                "text": f"[Timestamp {time_str} s]: Above image is a visual slide content [ImageName: {time_str}] related to the ongoing discussion as follows. Please quote if it adds significant value. Skip if it is not very meaningful."
            })
        elif item['type'] == 'transcript':
            content_blocks.append({
                "type": "text",
                "text": f"[Timestamp {time_str}]: Transcript excerpt: '{item['data']}'"
            })
    yield content_blocks


In [22]:
model = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    base_url="https://openrouter.ai/api/v1",
    model="google/gemini-2.0-flash-001" # "google/gemini-2.0-flash-exp:free"
)

chain = model | StrOutputParser()

system_message = SystemMessage(content="""
You are a rigorous technical writer assembling a technical document from streamed, timestamped chunks of slides/key frames and transcript captured from a meeting or presentation.
Please follow the following steps and rules:
1) First correct any miscaptured words in the transcripts by strictly referring to the slides, especially for terminologies and acronyms.
2) Write an accurate, detailed, and comprehensive professional technical document, based on the infomation parsed from the meeting key frames and corrected transcripts, together with the text generated in earlier sections as the context (if provided).
3) Always cross check all the info especially terminologies, acronyms, and numbers, between key frames, transcripts and previous context.
3) Write the new content as a smooth continuation of the previous context (if any) but do not repeat or rewrite the previous context. Skip if it was already mentioned in the previous context.
4) Focus on accurate metrics, decisions made and key action items. Record conflicts explicitly if any.
5) Never invent metrics, owners, or dates. If unknown, write “TBD”. Do not guess.
6）Explicitly quote the names of the key frame in the format of `[ImageName: XXX.XX]` when rewriting based on the transcripts that are discussing the key frame, so that the readers know which frame the ongoing discussion is about.
7）Do not quote timestamps.
8) Use proper syntax for inline or block fomulas using LaTeX code
""")

all_generations = ''
total_events = len(all_events)
chunk_seperations = []
processed_events = 0
for i, content_blocks in enumerate(yield_messages(all_events, max_size=5e6)):
    processed_events += len(content_blocks)
    ts = content_blocks[-1]['text'].split(']')[0] + ']'
    # if i > 2:
    #     break
    human_message_context = HumanMessage(content=[{"type":"text", "text": f"---------Previous Context--------\n{all_generations}\n---------End of Previous Context--------\n"}])
    human_message_new = HumanMessage(content=content_blocks)
    all_generations += chain.invoke([system_message, human_message_context, human_message_new])
    print(f'\nProcessed Chunk #{i+1}, {processed_events} / {total_events} content blocks @ {ts}, Generated {len(all_generations)} Characters')
    chunk_seperations.append(len(all_generations))




Processed Chunk #1, 210 / 835 content blocks @ [Timestamp 847.34], Generated 3581 Characters

Processed Chunk #2, 282 / 835 content blocks @ [Timestamp 1178.70], Generated 4933 Characters

Processed Chunk #3, 306 / 835 content blocks @ [Timestamp 1248.28], Generated 6912 Characters
Skipping continuous key frame 1417.00

Processed Chunk #4, 364 / 835 content blocks @ [Timestamp 1463.44], Generated 7975 Characters
Skipping continuous key frame 1470.00
Skipping continuous key frame 1471.00
Skipping continuous key frame 1472.00
Skipping continuous key frame 1475.00
Skipping continuous key frame 1479.00

Processed Chunk #5, 377 / 835 content blocks @ [Timestamp 1481.42], Generated 8263 Characters
Skipping continuous key frame 1482.00
Skipping continuous key frame 1484.00
Skipping continuous key frame 1485.00
Skipping continuous key frame 1486.00

Processed Chunk #6, 389 / 835 content blocks @ [Timestamp 1492.46], Generated 8448 Characters
Skipping continuous key frame 1494.00
Skipping cont

In [23]:
output_round1 = all_generations
with open(f'{data_path}/output_round1.txt', 'w') as f:
    f.write(output_round1)

In [None]:
for i, chunk_seperation in enumerate(chunk_seperations):
    if i == 0:
        print(output_round1[0:chunk_seperation])
    else:
        print(output_round1[chunk_seperations[i-1]:chunk_seperation])
    print(f'\n\n--------- Chunk Seperation {i} ---------\n\n')

The presentation focuses on TSMC and Intel's packaging process technologies, specifically CoWoS, EMIB, Foveros, and chiplets [ImageName: 1.00]. TSMC is recognized as a leader in semiconductor manufacturing, with strengths in both high yield of advanced nanometer nodes and in packaging technology. Among TSMC, Samsung, and Intel, TSMC's packaging process, including CoWoS and 3D fabrics, stands out. The presentation will primarily focus on CoWoS.

The Key Performance Indicators (KPIs) for packaging technology, often referred to as P3C2, include performance, power, packaging profile, cycle time, and cost [ImageName: 90.00]. Performance encompasses bandwidth (BW), Fmax, and functionality (Hi, Function). Power considers efficiency, thermal properties (Tj). The packaging profile involves footprint and thickness. Cycle time refers to the speed to market, and cost relates to customer affordability. From an engineering perspective, yield and reliability are critical factors in process technology. The industry trend is towards heterogeneous system integration with continued pitch scaling, aiming for smaller dimensions to pack more components into a single package.

TSMC's 3D Fabric Technology Portfolio includes both 3D silicon stacking as well as advanced packaging [ImageName: 209.00]. Silicon stacking includes SoIC (System on Integrated Chips) with options for bumped (SoIC-P) and bumpless (SoIC-X) configurations. Advanced packaging options include CoWoS (Chip-on-Wafer-on-Substrate) with silicon interposers (CoWoS-S) or RDL (Redistribution Layer) interposers (CoWoS-L/R), as well as InFO (Integrated Fan-Out) options like InFO-PoP, InFO-2.5D, and InFO-3D. The focus here is on CoWoS, especially the use of silicon interposers.

TSMC's CoWoS updates are primarily aimed at High-Performance Computing (HPC) applications requiring integrated advanced logic and HBM (High Bandwidth Memory) [ImageName: 266.00]. TSMC supports more than 140 CoWoS products from over 25 customers. A key development is the 6x reticle-size (~5,000 mm²) RDL interposer, capable of accommodating 12 stacks of HBM memory. The AI server market, estimated at $1 trillion through 2030 in the US alone, drives the need to integrate high-bandwidth memory and logic. Advanced packaging is costly and typically only affordable for larger customers who require a high volume of chips per package. Apple, Nvidia, AMD, and Qualcomm are among the major customers. TSMC is developing a 6x reticle size RDL interposer, where the redistribution layer is buried inside the silicon, enabling signal routing within the silicon interposer. 6x reticle size is a significant advancement from the 3x reticle size in 2021.

The reticle size indicates the maximum size of a die that a lithography tool can print, currently 26mm x 33mm [ImageName: 553.00]. This translates to a field size of 858 mm², a standard value across most lithography tools. The mask size is typically four times larger than the printed image (4x).
It's difficult to produce die sizes larger than 26mm x 33mm in a single print. While multiple dies can be placed within the reticle, they must conform to the rectangular shape. High-end lithography A processes print with a mask that is 26x33 at 4X and XX, but AX, AX in Y axis, indicating the need for double printing for larger dies. Intel's pursuit of larger masks (nine-inch instead of the current six-inch) to avoid double printing faces challenges due to commitment issues. A 6x reticle size equates to 5,148 mm² (26x33 times 6), which, according to TSMC, is a universally understood unit.



--------- Chunk Seperation 0 ---------


The NVIDIA H100 GPU, utilizing TSMC's CoWoS technology, features a large 814 mm² GH100 GPU die and six memory chips [ImageName: 849.00]. The GH100 GPU consists of 80 billion transistors and is fabricated using the N4 process. The limitation in die size, constrained by the reticle size of 858 mm², impacts the design. The crucial question is the yield of such a large die. [ImageName: 936.00] discusses the die yield. TSMC achieves an 80% yield for Apple's A13 chip, which has a die size of approximately 1 cm x 1 cm (100 mm²). However, for larger GPUs like the GH100 (die size > 8 cm²), TSMC's yield is expected to drop to around 30-40% due to Poisson statistics. A formula is presented that shows the yield reduction as a function of die area:

$y(A) = (1 + \frac{A * 0.223}{2})^{-2}$

where \( y(A) \) is the yield and \( A \) is the die size in cm². The larger the die, the lower the yield.

The video explains that particle contamination on a wafer is a random event, analogous to bombs landing in London during World War II, following a Poisson distribution [ImageName: 1001.00, 1144.00]. R. Clark's analysis of bomb hits in London showed that the distribution was random and could be modeled using Poisson statistics. The contamination of particles on a wafer surface is assumed to be random events. The TSMC 5nm yield for Nvidia is around 80%.



--------- Chunk Seperation 1 ---------


The presentation further details the yield model and influence of surface particles [ImageName: 1185.00]. A "killer defect density value for Surface Prep" is calculated based on Poisson's model for yield, achieving 99%. Values have improved from 2004 (97%) to 2005 (94.2%) for critical particle count per wafer. Current models are utilized for Yield Enhancement and Starting Materials Surface Prep. Similar particle density values apply to groups of differing equations. Yield Enhancement (YE) and Fabrication Process Integration (FPI) jointly determine appropriate Starting Materials' defect levels and establish specific "call outs" for critical cleans (specifically pre-gate). Concern remains regarding the accurate measurement of defects at a critical particle diameter that contributes to yield.

The concept of Poisson distribution is further highlighted, stating that "particles/defects arriving on the wafer" is like bombs landing in London [ImageName: 1185.00]. A simple yield model can be expressed assuming no defects before time *t* as:

$P(k=0) = e^{-\mu} = e^{-DA}$

where $D$ = defect density, and $A$ = critical area.

The presentation then introduces a "Classic Yield Model Modified by ITRS" [ImageName: 1198.00]:

$Y = f(A, D)$

Where $D$ = defect density, which has a distribution with respect to the defect size, and $A$ = critical area in which a defect has a high probability of resulting in a fault, also has a distribution with respect to the defect size.

$Y = \int_{0}^{\infty} e^{-DA} f'(D) dD$

where

$f(D) = [\Gamma(\alpha)B^{\alpha}]^{-1} D^{\alpha-1} e^{-D/B}$

A negative binomial distribution is also provided:

$Y_r = (1 + \frac{AD}{\alpha})^{-\alpha}$

[ImageName: 1211.00] provides an example: TSMC's yield rate is reported to be 80% for the Apple A13 chip, which has a die size of approximately 0.44 cm x 0.44 cm.
The value of $\alpha$ is suggested by ITRS as $\alpha=2$. Also provided is a formula:

 $y(A) = (1 + \frac{A * 1.15}{2})^{-2}$



--------- Chunk Seperation 2 ---------


Continuing the discussion on die yield, it's observed that yield decreases significantly as die size increases. While continuous effort can improve yield, it's unlikely to reach the high levels (80-90%) seen in smaller dies. This has a significant impact on factory capacity planning because more wafers are needed to produce the same number of functional dies. TSMC is offering Nvidia a "super hot run status," prioritizing their wafers to run faster, potentially doubling the speed, to maintain die output and meet delivery dates. The presenter emphasizes that die size matters significantly in yield considerations.

Subsequently, the presentation transitions to discussing the chronicle of CoWoS [ImageName: 1397.00]. It's suggested to explore TSMC's website to better understand the company's strategy (https://www.tsmc.com/english/dedicatedFoundry/technology/cowos). The presenter expresses interest in the Nvidia Tesla V100 [ImageName: 1463.00] which features high-performance computing (HPC) capabilities and 32GB of memory, often utilized as an AI chip.



--------- Chunk Seperation 3 ---------


The presentation shifts its focus to analyzing the performance and functionality of chips, specifically within the context of high-performance computing [ImageName: 1465.00, 1477.00, 1478.00, 1480.00]. The discussion emphasizes the growing importance of high-performance computing (HPC).



--------- Chunk Seperation 4 ---------


The presentation continues to emphasize the growing importance of high-performance computing (HPC) and its relevance to various markets [ImageName: 1483.00, 1487.00, 1490.00, 1492.00].



--------- Chunk Seperation 5 ---------


The high-performance computing market is where the money is in the packaging world [ImageName: 1499.00].


--------- Chunk Seperation 6 ---------


The presenter suggests visiting TSMC's website to understand CoWoS (https://www.tsmc.com/english/dedicatedFoundry/technology/cowos) [ImageName: 1397.00, 1509.00]. Fundamentally, CoWoS involves integrating chips onto a silicon interposer.

[ImageName: 1519.00] illustrates a Si interposer which acts as a bridge or conduit allowing electrical signals to pass through it. The signal lines are embedded within the interposer, which are becoming increasingly complex, with more advanced routing capabilities. The goal is to place components as close as possible to minimize speed loss, ideally stacking them. While memory stacking is feasible, placing memory directly on top of the CPU is challenging due to stress and differing thermal properties (Tj) among components and varying voltage requirements between I/O, CPU, and memory. Silicon has reasonable thermal conductivity and a high Young's modulus (130-150 GPa), making it suitable for creating fine lines within the interposer.

[ImageName: 1663.00] shows the silicon interposer flow. Die 1 and Die 2 are connected via the purple interposer to the packaging substrate, which features a PCB ball grid array. The interposer sits between the die and the package. The components are connected via high-speed links using pre-manufactured routing lines inside the interposer. Ensuring proper connection is crucial for signal transmission between the two dies. Failure to do so renders the assembly non-functional. Packaging yield is extremely important. Connecting two good dies on the interposer is essential; otherwise, both dies are lost, with no opportunity for rework. This increases the cost, potentially doubling it, especially considering the cost of the silicon interposer.

The connection between the silicon interposer's front and back surfaces are made using what is called a Through Silicon Via (TSV). TSVs are expensive due to the use of copper. TSVs are large (tens of microns deep), and the electroplating process is slow. The cycle time is very important in P3C2. It takes a long time to "dig" such deep vias and then fill them with copper.



--------- Chunk Seperation 7 ---------


The presentation provides a detailed comparison between TSMC's CoWoS and Intel's EMIB.

[ImageName: 1898.00] illustrates that CoWoS is a 2.5D IC (TSV) interposer-based packaging technology designed by TSMC for high-performance applications. The diagram shows HBM (High Bandwidth Memory) DRAM dies are stacked with TSVs (Through Silicon Vias). The Compute Logic die is connected to the package substrate using C4 Cu bumps. Microbumps are used to connect the HBM DRAM dies to the Logic die, with short wires for connections within the package and longer wires outside. The diagram also provides a more detailed view of the interposer layout, showing Metal 1, Metal 2, and Metal 3 layers, CBM (Cu Ball Metallization), HK (High-K dielectric), C4 connections, and passivation layers. The presenter notes that TSMC excels in making this process work. Although Intel is aware of it, reverse engineering by companies reveals all the materials and technologies used, however, TSMC's hidden IP and process technology lead to high yield, giving them the key advantage. The presenter will skip the detailed discussion of each layer to focus on the overall comparison, highlighting the broader picture.

[ImageName: 2041.00] shows Intel's EMIB (Embedded Multi-die Interconnect Bridge), which uses 2D scaling with Si bridges embedded in organic substrates. I/O or bumps are placed at the edge of the die with pitches of 55-36 μm. EMIB does not require costly TSVs, which makes it flexible, but the presenter is not convinced. The silicon bridge is embedded inside an organic substrate and they talk at the edge of the material and transfer important signals. The presenter expresses reservations about the scaling potential of Intel's EMIB compared to TSMC. Embedding the silicon "bridge" and creating trenches within the organic material on top could pose challenges for future scaling, especially when dealing with copper inside plastic at very small dimensions compared to silicon. The presenter acknowledges the flexibility of EMIB in connecting different dies but questions its advantages over silicon interposers.

The presenter then references an Intel advertisement video [ImageName: 2105.00, 2142.00, 2149.00, 2217.00, 2304.00]. Intel acknowledges that the size of a 2.5D solution is limited. TSMC is moving to solve that problem to make it so big with its 3x and 4x reticle sizes in 2023, moving to 6x reticle sizes soon. The presenter concludes on this point that Intel's claim that CoWoS is limited is no longer valid.



--------- Chunk Seperation 8 ---------


The presenter discussed the pros and cons of Intel's EMIB [ImageName: 2355.00], noting that it leverages existing organic packaging technology and enables larger die counts and package configurations. It is lower in cost than a full-size silicon interposer and supports high data rate signaling between adjacent dies, using simple driver/receiver circuitry. EMIB also offers the ability to optimize each die-to-die link individually by customizing the bridge for that link.
However, EMIB has cons that include additional complexity in die bumping and the package assembly process, disparate CTE (Coefficient of Thermal Expansion) between the package, the die, and the EMIB bridge,.

AWS is the first EMIB customer [ImageName: 2458.00], utilizing dies fabricated by TSMC at 5nm and packaged by Intel. The presenter refers to the Graviton3 processor, which is chiplet-based with seven silicon dies and 55 billion transistors. It delivers 25% higher performance/core vs. Graviton2 and includes the first DDR5 system in AWS data centers, providing 50% more DDR bandwidth.

[ImageName: 2509.00] compares CoWoS and EMIB. CoWoS benefits from dense fine pitch interconnects and CTE matching, which reduces stresses on the die's backend, reducing the likelihood of failures in the low-K ILD. It also makes chip attach easier due to better CTE matching and lower costs than TSV-based Si interposers. Concerns include the interposer size being limited by reticle field and cost, TSV capacitance impacting signal integrity for signals in off-package links, higher insertion losses in silicon, and complex assembly. EMIB benefits from dense fine pitch interconnects and localized high-density wiring, ensuring the on-package interconnect is not affected by the presence of the bridge. It claims no practical limits to die size and is a key advantage. It leverages existing organic substrate manufacturing, making bridge manufacturing simpler than interposer manufacturing since TSV processes are not needed. Bridge silicon costs are lower than silicon interposers due to the lack of TSVs. EMIB concerns include increased organic substrate manufacturing complexity.

The presenter states that the limitation of the interposer size for CoWoS is no longer valid with TSMC's advancements in reticle sizes. Additionally, despite CoWoS's challenges with stress, EMIB faces thermal problems due to the use of polymers. The presenter concludes that high yield and fast throughput are critical, and introducing different proposals to the factory to make everything work is a challenging task.

Intel is also working on other new technologies like Foveros [ImageName: 2588.00], which involves die-to-die stacking with direct copper bonding. With this technology, they don't need TSVs anymore.



--------- Chunk Seperation 9 ---------


[ImageName: 2603.00] shows that Foveros has a fancy interposer (base). The silicon base die has active circuitry relevant for the full operation of the main compute processors found in the top piece of silicon. The bump pitch is 36μm.

The presenter explains that Foveros involves two silicon components: a complete top silicon and a base. The computing components are placed on top, connected to the base die via micro-balls. This "bomb" refers to direct copper bonding, where connections are pre-manufactured inside each wafer, top and bottom [ImageName: 2653.00]. They are then bonded together directly.

Direct copper bonding, possibly invented by IBM [ImageName: 2665.00], involves polishing the bottom of the wafer and thinning it down before bonding. Typically, epoxy is used to bond them together.



--------- Chunk Seperation 10 ---------


Direct copper bonding requires careful consideration of the materials used to achieve effective bonding, especially given the different thermal expansion coefficients between silicon dioxide (\(SiO_2\)) and copper (Cu) [ImageName: 2692.00]. The presenter states that tensile stress could be induced due to the thermal stress and large-sized dies could cause reduced process margin. The presenter said that he/she is going to present a lot of seminars related to this bonding technology in the future. There are reports that Intel is working on it but not confirmed.

The presentation shifts to a discussion on the pros of chiplets [ImageName: 2779.00]. Chiplets offer improved die yield and flexibility, enabling the combination of different technologies for complex systems through massive parallel mounting with high accuracy (<< 1 μm).



--------- Chunk Seperation 11 ---------


The presentation discusses TSMC's strengths and provides insight into why it is more successful than its competitors, Samsung and Intel [ImageName: 3229.00]. TSMC focuses on easing into new technologies to establish a solid baseline. TSMC attracts numerous customers to facilitate process learning, emphasizing that it won't compete with its customers. This encourages more customers to trust TSMC with their designs without concerns about IP theft. TSMC also cultivates local suppliers to create a flywheel effect, fostering close collaboration within the Taiwanese ecosystem. This collaborative approach forms a positive feedback loop for yield learning. TSMC, together with qualified suppliers use as many chemicals, tools and parts as they can, helping the suppliers develop new specifications.

The presenter explains that the flywheel effect refers to building and linking together many suppliers with TSMC[ImageName: 3417.00]. Small wheels (suppliers) turn and assist the big wheel (TSMC), and the big wheel, in turn, helps the small wheels to turn.



--------- Chunk Seperation 12 ---------


TSMC fosters local suppliers to enable a flywheel effect, creating a positive feedback loop for yield learning [ImageName: 3444.00]. TSMC attracts numerous customers, which facilitates process learning and provides valuable insights into diverse customer demands, something an IDM (Integrated Device Manufacturer) like Intel lacks. [ImageName: 3480.00, 3567.00] shows TSMC's IP portfolio which is mind-bogglingly large, counting 21,000 IP titles from 0.35um to 5nm. These IPs are free to use for designers. TSMC works with multiple customers and suppliers, forming work groups to solve problems and develop new IPs, inventions, that are shared with everyone else, attracting more customers and creating a positive feedback loop.
Samsung and Intel's IP portfolios are smaller than TSMC's because their business models are not as customer-focused from the beginning. Intel is now breaking away and renaming their manufacturing group into a foundry.

The presentation concludes the discussion of TSMC and Intel, CoWoS, EMIB, Foveros, and chiplets [ImageName: 3604.00]. The presenter hopes the audience has a good understanding of the big picture to guide future deep dives into the process technology aspects. It's crucial to understand the problems being solved and which customers are being served.



--------- Chunk Seperation 13 ---------




In [27]:
len(output_round1)

19944

In [28]:
# review step - 2nd pass

system_message = SystemMessage(content="""
You are a careful technical reviewer reviewing and improving a technical document assembled by a technical writer from streamed, timestamped chunks of slides/key frames and transcript captured from a meeting or presentation.
Please follow the following steps and rules:
1) Always output the complete draft of the improved technical document directly. It should be of the similar length to the orginal draft. Don't delete any details or shorten any sections.
2) The key frames and transcripts you are getting is only relevant to part of the draft. Focus on improving the relevant sections only. Do not change other sections. Cross check the entire draft for consistency of the changes made.
3) The transcripts may contain miscaptured words. But the original writer tried to correct them in the original draft. Please cross check all the info especially terminologies, acronyms, and numbers, between key frames, transcripts and the entire draft.
4) Double check on accurate metrics, decisions made and key action items. Highlight conflicts explicitly if any.
5) Never invent metrics, owners, or dates. If unknown, write “TBD”. Do not guess.
6）Preserve the quoted names of the key frame in the format of `[ImageName: XXX.XX]` in the draft. If the original writer missed quoting any key frame during discussion, please add them back in.
7）Do not quote timestamps.
8) Use proper syntax for inline or block fomulas using LaTeX code
""")

all_generations = output_round1
total_events = len(all_events)
# chunk_seperations = []
processed_events = 0
for i, content_blocks in enumerate(yield_messages(all_events, max_size=7e6)):
    processed_events += len(content_blocks)
    ts = content_blocks[-1]['text'].split(']')[0] + ']'
    # if i > 2:
    #     break
    human_message_context = HumanMessage(content=[{"type":"text", "text": f"---------Previous Draft--------\n{all_generations}\n---------End of Previous Draft--------\n"}])
    human_message_new = HumanMessage(content=content_blocks)
    all_generations = chain.invoke([system_message, human_message_context, human_message_new]) # replace directly
    with open(f'{data_path}/output_round2_{i}.txt', 'w') as f:
        f.write(all_generations)
    print(f'\nProcessed Chunk #{i+1}, {processed_events} / {total_events} content blocks @ {ts}, Generated {len(all_generations)} Characters')
    # chunk_seperations.append(len(all_generations))


Processed Chunk #1, 248 / 835 content blocks @ [Timestamp 989.90], Generated 19948 Characters

Processed Chunk #2, 295 / 835 content blocks @ [Timestamp 1209.42], Generated 20139 Characters
Skipping continuous key frame 1417.00

Processed Chunk #3, 364 / 835 content blocks @ [Timestamp 1463.44], Generated 20207 Characters
Skipping continuous key frame 1470.00
Skipping continuous key frame 1471.00
Skipping continuous key frame 1472.00
Skipping continuous key frame 1475.00
Skipping continuous key frame 1479.00
Skipping continuous key frame 1482.00

Processed Chunk #4, 380 / 835 content blocks @ [Timestamp 1483.00], Generated 20260 Characters
Skipping continuous key frame 1484.00
Skipping continuous key frame 1485.00
Skipping continuous key frame 1486.00
Skipping continuous key frame 1494.00
Skipping continuous key frame 1495.00
Skipping continuous key frame 1497.00
Skipping continuous key frame 1498.00

Processed Chunk #5, 395 / 835 content blocks @ [Timestamp 1499.58], Generated 20281 

In [None]:
print(all_generations)

The presentation focuses on TSMC and Intel's packaging process technologies, specifically CoWoS, EMIB, Foveros, and chiplets [ImageName: 1.00]. TSMC is recognized as a leader in semiconductor manufacturing, with strengths in both high yield of advanced nanometer nodes and in packaging technology. Among TSMC, Samsung, and Intel, TSMC's packaging process, including CoWoS and 3D fabrics, stands out. The presentation will primarily focus on CoWoS.

The Key Performance Indicators (KPIs) for packaging technology, often referred to as P3C2, include performance, power, packaging profile, cycle time, and cost [ImageName: 90.00, 1882.00]. Performance encompasses bandwidth (BW), Fmax, and functionality ("Hi, Function"). Power considers efficiency and thermal properties (Tj). The packaging profile involves footprint and thickness. Cycle time refers to the speed to market, and cost relates to customer affordability. From an engineering perspective, yield and reliability are critical factors in process technology. The industry trend is towards heterogeneous system integration with continued pitch scaling, aiming for smaller dimensions to pack more components into a single package.

TSMC's 3D Fabric Technology Portfolio includes both 3D silicon stacking as well as advanced packaging [ImageName: 209.00]. Silicon stacking includes SoIC (System on Integrated Chips) with options for bumped (SoIC-P) and bumpless (SoIC-X) configurations. Advanced packaging options include CoWoS (Chip-on-Wafer-on-Substrate) with silicon interposers (CoWoS-S) or RDL (Redistribution Layer) interposers (CoWoS-L/R), as well as InFO (Integrated Fan-Out) options like InFO-PoP, InFO-2.5D, and InFO-3D. The focus here is on CoWoS, especially the use of silicon interposers.

TSMC's CoWoS updates are primarily aimed at High-Performance Computing (HPC) applications requiring integrated advanced logic and HBM (High Bandwidth Memory) [ImageName: 266.00]. TSMC supports more than 140 CoWoS products from over 25 customers. A key development is the 6x reticle-size (~5,000 mm²) RDL interposer, capable of accommodating 12 stacks of HBM memory. The AI server market, estimated at $1 trillion through 2030 in the US alone, drives the need to integrate high-bandwidth memory and logic. Advanced packaging is costly and typically only affordable for larger customers who require a high volume of chips per package. Apple, Nvidia, AMD, and Qualcomm are among the major customers. TSMC is developing a 6x reticle size RDL interposer, where the redistribution layer is buried inside the silicon, enabling signal routing within the silicon interposer. 6x reticle size is a significant advancement from the 3x reticle size in 2021.

The reticle size indicates the maximum size of a die that a lithography tool can print, currently 26mm x 33mm [ImageName: 553.00]. This translates to a field size of 858 mm², a standard value across most lithography tools. The mask size is typically four times larger than the printed image (4x).
It's difficult to produce die sizes larger than 26mm x 33mm in a single print. While multiple dies can be placed within the reticle, they must conform to the rectangular shape. High-end lithography processes print with a mask that is 26mm x 33mm at 4X in both X and Y axes, indicating the need for double printing for larger dies. Intel's pursuit of larger masks (nine-inch instead of the current six-inch) to avoid double printing faces challenges due to commitment issues. A 6x reticle size equates to 5,148 mm² (26x33 times 6), which, according to TSMC, is a universally understood unit.
The NVIDIA H100 GPU, utilizing TSMC's CoWoS technology, features a large 814 mm² GH100 GPU die and six memory chips [ImageName: 849.00]. The GH100 GPU consists of 80 billion transistors and is fabricated using the N4 process. The limitation in die size, constrained by the reticle size of 858 mm², impacts the design. The crucial question is the yield of such a large die. [ImageName: 936.00] discusses the die yield. TSMC's yield for Apple's A13 chip, which has a die size of approximately 0.44 cm x 0.44 cm, is reported to be 80%. However, for larger GPUs like the GH100 (die size > 8 cm²), TSMC's yield is expected to drop to around 30-40% due to Poisson statistics. A formula is presented that shows the yield reduction as a function of die area:

$Y(A) = (1 + \frac{A \times 0.223}{2})^{-2}$

where \( Y(A) \) is the yield and \( A \) is the die size in cm². The larger the die, the lower the yield.

The video explains that particle contamination on a wafer is a random event, analogous to bomb hits in London during World War II, following a Poisson distribution [ImageName: 1001.00, 1144.00]. As R. Clark's analysis of bomb hits in London showed, the distribution was random and could be modeled using Poisson statistics. [ImageName: 1001.00] illustrates this analysis, dividing the London area into grids and comparing actual bomb hits with those predicted by the Poisson distribution to demonstrate the randomness. The contamination of particles on a wafer surface is assumed to be random events. The TSMC 5nm yield for Nvidia is around 80%.
The presentation further details the yield model and influence of surface particles [ImageName: 1185.00]. A "killer defect density value for Surface Prep" is calculated based on Poisson's model for yield, achieving 99%. Values have improved from 2004 (97%) to 2005 (94.2%) for critical particle count per wafer. Current models are utilized for Yield Enhancement and Starting Materials Surface Prep. Similar particle density values apply to groups of differing equations. Yield Enhancement (YE) and Fabrication Process Integration (FPI) jointly determine appropriate Starting Materials' defect levels and establish specific "call outs" for critical cleans (specifically pre-gate). Concern remains regarding the accurate measurement of defects at a critical particle diameter that contributes to yield.

The concept of Poisson distribution is further highlighted, stating that "particles/defects arriving on the wafer" is like bombs landing in London [ImageName: 1185.00]. A simple yield model can be expressed assuming no defects before time *t* as:

$P(k=0) = e^{-\mu} = e^{-DA}$

where $D$ = defect density, and $A$ = critical area.

The presentation then introduces a "Classic Yield Model Modified by ITRS" [ImageName: 1198.00]:

$Y = f(A, D)$

Where $D$ = defect density, which has a distribution with respect to the defect size, and $A$ = critical area in which a defect has a high probability of resulting in a fault, also has a distribution with respect to the defect size.

$Y = \int_{0}^{\infty} e^{-DA} f'(D) dD$

where

$f'(D) = [\Gamma(\alpha)B^{\alpha}]^{-1} D^{\alpha-1} e^{-D/B}$

A negative binomial distribution is also provided:

$Y_r = (1 + \frac{AD}{\alpha})^{-\alpha}$

[ImageName: 1211.00] provides an example: TSMC's yield rate is reported to be 80% for the Apple A13 chip, which has a die size of approximately 0.44 cm x 0.44 cm.
The value of $\alpha$ is suggested by ITRS as $\alpha=2$. Also provided is a formula:

 $Y(A) = (1 + \frac{A \times 1.15}{2})^{-2}$

Continuing the discussion on die yield, it's observed that yield decreases significantly as die size increases. While continuous effort can improve yield, it's unlikely to reach the high levels (80-90%) seen in smaller dies. [ImageName: 1249.00] shows TSMC N5/N7 Yield. This has a significant impact on factory capacity planning because more wafers are needed to produce the same number of functional dies. TSMC recently announced that they will give Nvidia a "super hot run status," prioritizing their wafers to run faster, potentially doubling the speed, to maintain die output and meet delivery dates. The presenter emphasizes that die size matters significantly in yield considerations.

Subsequently, the presentation transitions to discussing the chronicle of CoWoS [ImageName: 1397.00]. It's suggested to explore TSMC's website to better understand the company's strategy (https://www.tsmc.com/english/dedicatedFoundry/technology/cowos) [ImageName: 1509.00]. The presenter expresses interest in the Nvidia Tesla V100 [ImageName: 1463.00] which features high-performance computing (HPC) capabilities and 32GB of memory, often utilized as an AI chip.
The presentation shifts its focus to analyzing the performance and functionality of chips, specifically within the context of high-performance computing [ImageName: 1465.00, 1477.00, 1478.00, 1480.00]. The discussion emphasizes the growing importance of high-performance computing (HPC).
The presentation continues to emphasize the growing importance of high-performance computing (HPC) and its relevance to various markets [ImageName: 1483.00, 1487.00, 1490.00, 1492.00, 1493.00].
The high-performance computing market is where the money is in the packaging world [ImageName: 1499.00].The presenter suggests visiting TSMC's website to understand CoWoS (https://www.tsmc.com/english/dedicatedFoundry/technology/cowos). Fundamentally, CoWoS involves integrating chips onto a silicon interposer.

[ImageName: 1519.00] illustrates a Si interposer which acts as a bridge or conduit allowing electrical signals to pass through it. The signal lines are embedded within the interposer and are becoming increasingly complex with more advanced routing capabilities. The goal is to place components as close as possible to minimize speed loss, ideally stacking them. While memory stacking is feasible, placing memory directly on top of the CPU is challenging due to stress and differing thermal properties (Tj) among components and varying voltage requirements between I/O, CPU, and memory. Silicon has reasonable thermal conductivity and a high Young's modulus (130-150 GPa), making it suitable for creating fine lines within the interposer.

[ImageName: 1663.00] shows the silicon interposer flow. Die 1 and Die 2 are connected via the purple interposer to the packaging substrate, which features a PCB ball grid array. The interposer sits between the die and the package. The components are connected via high-speed links using pre-manufactured routing lines inside the interposer. Ensuring proper connection is crucial for signal transmission between the two dies. Failure to do so renders the assembly non-functional. Packaging yield is extremely important. Connecting two good dies on the interposer is essential; otherwise, both dies are lost, with no opportunity for rework. This increases the cost, potentially doubling it, especially considering the cost of the silicon interposer.

The connection between the silicon interposer's front and back surfaces are made using Through Silicon Vias (TSVs). TSVs are expensive due to the use of copper. TSVs are large (tens of microns deep), and the electroplating process is slow. The cycle time is very important in P3C2. It takes a long time to "dig" such deep vias and then fill them with copper.

The presentation provides a detailed comparison between TSMC's CoWoS and Intel's EMIB.

[ImageName: 1898.00] illustrates that CoWoS is a 2.5D IC (TSV) interposer-based packaging technology designed by TSMC for high-performance applications. The diagram shows HBM (High Bandwidth Memory) DRAM dies are stacked with TSVs (Through Silicon Vias). The Compute Logic die is connected to the package substrate using C4 Cu bumps. Microbumps are used to connect the HBM DRAM dies to the Logic die, with short wires for connections within the package and longer wires outside. The diagram also provides a more detailed view of the interposer layout, showing Metal 1, Metal 2, and Metal 3 layers, CBM (Cu Ball Metallization), HK (High-K dielectric), C4 connections, and passivation layers. The presenter notes that TSMC excels in making this process work, yielding high results. Although Intel is aware of it, reverse engineering by companies reveals all the materials and technologies used. However, TSMC's hidden IP and process technology lead to high yield, giving them the key advantage. The presenter will skip the detailed discussion of each layer to focus on the overall comparison, highlighting the broader picture.

[ImageName: 2041.00] shows Intel's EMIB (Embedded Multi-die Interconnect Bridge), which uses 2D scaling with Si bridges embedded in organic substrates. I/O or bumps are placed at the edge of the die with pitches of 55-36 μm. EMIB does not require costly TSVs, which makes it flexible, but the presenter is not convinced. The silicon bridge is embedded inside an organic substrate and they transfer important signals at the edge of the material. The presenter expresses reservations about the scaling potential of Intel's EMIB compared to TSMC. Embedding the silicon "bridge" and creating trenches within the organic material on top could pose challenges for future scaling, especially when dealing with copper inside plastic at very small dimensions compared to silicon. The presenter acknowledges the flexibility of EMIB in connecting different dies but questions its advantages over silicon interposers.

The presenter then references an Intel advertisement video [ImageName: 2105.00, 2142.00, 2149.00, 2217.00, 2304.00]. Intel acknowledges that the size of a 2.5D solution is limited. The presenter notes that TSMC is moving to solve that problem to make it so big with its 3x and 4x reticle sizes in 2023, moving to 6x reticle sizes soon. The presenter concludes on this point that Intel's claim that CoWoS is limited is no longer valid.
The presenter discussed the pros and cons of Intel's EMIB [ImageName: 2355.00], noting that it leverages existing organic packaging technology and enables larger die counts and package configurations. It is lower in cost than a full-size silicon interposer and supports high data rate signaling between adjacent dies, using simple driver/receiver circuitry. EMIB also offers the ability to optimize each die-to-die link individually by customizing the bridge for that link.
However, EMIB has cons that include additional complexity in die bumping and the package assembly process, disparate CTE (Coefficient of Thermal Expansion) between the package, the die, and the EMIB bridge. The presenter also noted the potential for thermal issues related to coefficient of thermal expansion and that the properties of polymers change as they heat up.

AWS is the first EMIB customer [ImageName: 2458.00], utilizing dies fabricated by TSMC at 5nm and packaged by Intel. The presenter refers to the Graviton3 processor, which is chiplet-based with seven silicon dies and 55 billion transistors. It delivers 25% higher performance/core vs. Graviton2 and includes the first DDR5 system in AWS data centers, providing 50% more DDR bandwidth.

[ImageName: 2509.00] compares CoWoS and EMIB. CoWoS benefits from dense fine pitch interconnects and CTE matching, which reduces stresses on the die's backend, reducing the likelihood of failures in the low-K ILD. It also makes chip attach easier due to better CTE matching and lower costs than TSV-based Si interposers. Concerns include the interposer size being limited by reticle field and cost, TSV capacitance impacting signal integrity for signals in off-package links, higher insertion losses in silicon, and complex assembly. EMIB benefits from dense fine pitch interconnects and localized high-density wiring, ensuring the on-package interconnect is not affected by the presence of the bridge. It claims no practical limits to die size, which was a key advantage before TSMC's advancements in reticle sizes. It leverages existing organic substrate manufacturing, making bridge manufacturing simpler than interposer manufacturing since TSV processes are not needed. Bridge silicon costs are lower than silicon interposers due to the lack of TSVs. EMIB concerns include increased organic substrate manufacturing complexity.

The presenter states that the limitation of the interposer size for CoWoS is no longer valid with TSMC's advancements in reticle sizes. Additionally, despite CoWoS's challenges with stress, EMIB faces thermal problems due to the use of polymers. The presenter concludes that high yield and fast throughput are critical, and introducing different proposals to the factory to make everything work is a challenging task.

Intel is also working on other new technologies like Foveros [ImageName: 2588.00], which involves die-to-die stacking with direct copper bonding. With this technology, they don't need TSVs anymore.
[ImageName: 2603.00] shows that Foveros has a fancy interposer (base). The silicon base die has active circuitry relevant for the full operation of the main compute processors found in the top piece of silicon. The bump pitch is 36μm.

The presenter explains that Foveros involves two silicon components: a complete top silicon and a base. The computing components are placed on top, connected to the base die via micro-balls. This refers to direct copper bonding, where connections are pre-manufactured inside each wafer, at the bottom and on the top [ImageName: 2653.00]. They are then bonded together directly.

Direct copper bonding, possibly invented by IBM [ImageName: 2665.00], involves polishing the bottom of the wafer and thinning it down before bonding. Typically, epoxy is **not** used to bond them together.
Direct copper bonding requires careful consideration of the materials used to achieve effective bonding, especially given the different thermal expansion coefficients between silicon dioxide (\(SiO_2\)) and copper (Cu) [ImageName: 2692.00]. The presenter states that tensile stress could be induced due to the thermal stress and large-sized dies could cause reduced process margin, because you need to make sure every bonding works. The presenter said that he/she is going to present a lot of seminars related to this bonding technology in the future. There are reports that Intel is working on it but not confirmed.

The presentation shifts to a discussion on the pros of chiplets [ImageName: 2779.00]. Chiplets offer improved die yield and flexibility, enabling the combination of different technologies for complex systems through massive parallel mounting with high accuracy (<< 1 μm).
The presentation discusses TSMC's strengths and provides insight into why it is more successful than its competitors, Samsung and Intel [ImageName: 3229.00]. TSMC focuses on easing into new technologies to establish a solid baseline. TSMC attracts numerous customers to facilitate process learning, emphasizing that it won't compete with its customers. This encourages more customers to trust TSMC with their designs without concerns about IP theft. TSMC also cultivates local suppliers to create a flywheel effect, fostering close collaboration within the Taiwanese ecosystem. This collaborative approach forms a positive feedback loop for yield learning. TSMC, together with qualified suppliers use as many chemicals, tools and parts as they can, helping the suppliers develop new specifications.

The presenter explains that the flywheel effect refers to building and linking together many suppliers with TSMC [ImageName: 3417.00]. Small wheels (suppliers) turn and assist the big wheel (TSMC), and the big wheel, in turn, helps the small wheels to turn.
TSMC fosters local suppliers to enable a flywheel effect, creating a positive feedback loop for yield learning [ImageName: 3444.00]. TSMC attracts numerous customers, which facilitates process learning and provides valuable insights into diverse customer demands, something an IDM (Integrated Device Manufacturer) like Intel lacks. As indicated on the slide titled "TSMC: The Ecosystem Flywheel" [ImageName: 3480.00, 3567.00], TSMC's IP portfolio is mind-bogglingly large, counting 21,000 IP titles from 0.35um to 5nm. These IPs are free to use for designers. TSMC works with multiple customers and suppliers, forming work groups to solve problems and develop new IPs, inventions, that are shared with everyone else, attracting more customers and creating a positive feedback loop.
Samsung and Intel's IP portfolios are smaller than TSMC's because their business models were not as customer-focused from the beginning. Intel is now breaking away and renaming their manufacturing group into a foundry.

The presentation concludes the discussion of TSMC and Intel, CoWoS, EMIB, Foveros, and chiplets [ImageName: 3604.00]. The presenter hopes the audience has a good understanding of the big picture to guide future deep dives into the process technology aspects. It's crucial to understand the problems being solved and which customers are being served.

## Findings:

1. `google/gemini-2.0-flash-001` follows the instruction the best out of similarly priced models (e.g. `openai/gpt-4.1-nano` etc)
2. Iteratively evaluate quality of the output and fine tune instuctions
3. Removing repetitive key frames due to presenter going back and forth produces greater coherence of the text
4. Skip continuous key frames without transcripts in between to skip videos and meaningless scrolls
5. TODOs:
    - add redundant key frame removal algorithm into the extraction process, based on `n_frames_to_last_repeat` and/or `time_to_last_repeat` criteria
    - Consider adding overlapping content between chunks to enhance continuity
    - Try out a final refining step for all concatenated text