Introduction: In this notebook, you will go through a method of how you can use generative models to summarize large documents.

Install the required packeges: Vertex AI SDK, other packages and their dependencies

In [None]:
%pip install google-cloud-aiplatform PyPDF2 ratelimit backoff --upgrade --quiet --user

In [1]:
# Automatically restart the kernel after installing packages
# to ensure the environment has access to newly installed dependencies.

# Import the IPython module
import IPython

# Get the current IPython application instance
app = IPython.Application.instance()

# Forcefully restart the kernel to apply the changes
app.kernel.do_shutdown(restart=True)

{'status': 'ok', 'restart': True}

**Authenticating your notebook environment**

In [3]:
!gcloud auth login

Go to the following link in your browser, and complete the sign-in prompts:

    https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=32555940559.apps.googleusercontent.com&redirect_uri=https%3A%2F%2Fsdk.cloud.google.com%2Fauthcode.html&scope=openid+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fappengine.admin+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fsqlservice.login+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcompute+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Faccounts.reauth&state=M1aRcmRwt3ZyDMQV2UhCdIWY9XcIKk&prompt=consent&token_usage=remote&access_type=offline&code_challenge=J-gxtwHE8t4f_fu_zGcTOEEEb7iSOHu6-MzuM1Iy-qQ&code_challenge_method=S256

Once finished, enter the verification code provided in your browser: 4/0ASVgi3KSRlOrdWUoSs6rnhMvW5XS0ehjDsIpr6WYJIuLoVNgULacWdCe1kGj7NCIR6uwUg

You are now logged in as [sfazeli@ualberta.ca].
Your current project is

**Import libraries**

In [7]:
import vertexai

PROJECT_ID = "tidal-datum-451517-i3"  # Your Google Cloud Project ID
vertexai.init(project=PROJECT_ID, location="us-central1")  # Initialize Vertex AI in the specified region

In [35]:
from pathlib import Path #
import urllib
import warnings

import PyPDF2
import backoff
from google.api_core import exceptions
import ratelimit
from tqdm import tqdm
from vertexai.language_models import TextGenerationModel
import time
import ratelimit
from vertexai.preview.generative_models import GenerativeModel

warnings.filterwarnings("ignore")

**Preparing data files**

In [21]:
# To begin, it will be needed to download a pdf file for the summarizing tasks below.
# Define a folder to store the files
data_folder = "data"
Path(data_folder).mkdir(parents=True, exist_ok=True)

#  Define a pdf link to download and place to store the download file
pdf_url = "https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf"
pdf_file = Path(data_folder, pdf_url.split("/")[-1])

# Download the file using `urllib` library
urllib.request.urlretrieve(pdf_url, pdf_file)

(PosixPath('data/practitioners_guide_to_mlops_whitepaper.pdf'),
 <http.client.HTTPMessage at 0x7d42cedc7810>)

In [22]:
# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Print three pages from the pdf
for i in range(3):
    text = pages[i].extract_text().strip()
    print(f"Page {i}: {text} \n\n")

Page 0: Practitioners guide to MLOps:  
A framework for continuous 
delivery and automation of  
machine learning.White paper
May 2021
Authors:  
Khalid Salama,  
Jarek Kazmierczak,  
Donna Schut 


Page 1: Table of Contents
Executive summary  3
Overview of MLOps lifecycle and core capabilities  4
Deep dive of MLOps processes  15
Putting it all together  34
Additional resources  36Building an ML-enabled system  6
The MLOps lifecycle  7
MLOps: An end-to-end workflow  8
MLOps capabilities  9
      Experimentation  11
      Data processing  11
      Model training  11
      Model evaluation  12
      Model serving  12
      Online experimentation  13
      Model monitoring  13
      ML pipelines  13
      Model registry  14
      Dataset and feature repository  14
      ML metadata and artifact tracking  15
ML development  16
Training operationalization  18
Continuous training  20
Model deployment  23
Prediction serving  25
Continuous monitoring  26
Data and model management  29
      Dat

**Using Stuffing method**

The most straightforward method for providing data to a language model is by embedding it directly into the prompt as context without using memory. This involves incorporating all relevant details within the prompt, arranged in the desired sequence for the model to process.

In [23]:
# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Entry string to concatenate all the extracted texts
concatenated_text = ""


# Loop through the pages
for page in tqdm(pages):
    # Extract the text from the page and remove any leading or trailing whitespace
    text = page.extract_text().strip()

    # Concat the extracted text to the concatenated text
    concatenated_text += text

print(f"There are {len(concatenated_text)} characters in the pdf")

100%|██████████| 37/37 [00:00<00:00, 87.97it/s]

There are 64758 characters in the pdf





**create a prompt template that can be used later in the notebook**

In [24]:
prompt_template = """
    Write a concise summary of the following text delimited by triple backquotes.
    Return your response in bullet points which covers the key points of the text.

    ```{text}```

    BULLET POINT SUMMARY:
"""

**use LLM via the API to summarize the extracted texts**

In [28]:
from vertexai.preview.generative_models import GenerativeModel

# Load the Gemini model
generation_model = GenerativeModel("gemini-pro")

# Define the prompt using the prompt template
prompt = prompt_template.format(text=concatenated_text)

# Use the model to summarize the text using the prompt
summary =  generation_model.generate_content([prompt])
print(summary)


candidates {
  content {
    role: "model"
    parts {
      text: "### MLOps: A framework for continuous delivery and automation of machine learning\n\n#### Overview\n\n* **MLOps** is a set of standardized processes and technology capabilities for building, deploying, and operationalizing ML systems rapidly and reliably.\n* MLOps practices can result in shorter development cycles, better collaboration between teams, increased reliability, performance, scalability, and security of ML systems, streamlined operational and governance processes, and increased return on investment of ML projects.\n* MLOps is an end-to-end workflow that encompasses seven integrated and iterative processes:\n    * ML development\n    * Training operationalization\n    * Continuous training\n    * Model deployment\n    * Prediction serving\n    * Continuous monitoring\n    * Data and model management\n\n#### Key processes\n\n##### ML development\n\n* This is the core activity of MLOps, where data scientists an

In [29]:
# Extract the summary text from the response
summary_text = summary.candidates[0].content.parts[0].text

# Print the clean summary
print(summary_text)

### MLOps: A framework for continuous delivery and automation of machine learning

#### Overview

* **MLOps** is a set of standardized processes and technology capabilities for building, deploying, and operationalizing ML systems rapidly and reliably.
* MLOps practices can result in shorter development cycles, better collaboration between teams, increased reliability, performance, scalability, and security of ML systems, streamlined operational and governance processes, and increased return on investment of ML projects.
* MLOps is an end-to-end workflow that encompasses seven integrated and iterative processes:
    * ML development
    * Training operationalization
    * Continuous training
    * Model deployment
    * Prediction serving
    * Continuous monitoring
    * Data and model management

#### Key processes

##### ML development

* This is the core activity of MLOps, where data scientists and ML researchers develop and improve models.
* The primary output of this process is a 

The model returned an error message: "400 Request contains an invalid argument" due to the extracted text exceeding the processing limit of the generative model.

To prevent this issue, you should input only a portion of the extracted text, such as the first 30,000 words.

In [30]:
# Define the prompt using the prompt template
prompt = prompt_template.format(text=concatenated_text[:30000])

# Use the model to summarize the text using the prompt
summary = generation_model.generate_content([prompt])

print(summary)

candidates {
  content {
    role: "model"
    parts {
      text: "## MLOps: A Guide to Continuous Delivery and Automation of Machine Learning\n\n**Key Points:**\n\n**MLOps Lifecycle:**\n\n* **7 Integrated Processes:**\n    * ML Development\n    * Training Operationalization\n    * Continuous Training\n    * Model Deployment\n    * Prediction Serving\n    * Continuous Monitoring\n    * Data and Model Management\n* **Workflow:**\n    * **Experimentation:** Data analysis, model prototyping, training procedures\n    * **Data Processing:** Prepare and transform data for ML\n    * **Model Training:** Run training algorithms efficiently\n    * **Model Evaluation:** Assess model effectiveness\n    * **Model Serving:** Deploy and serve models in production\n    * **Online Experimentation:** Understand model performance before release\n    * **Model Monitoring:** Track model performance in production\n\n**MLOps Capabilities:**\n\n* **Core Capabilities:**\n    * Experimentation\n    * Data Proc

Summary

Since the full text exceeds the model's capacity, it is successfully generated a concise, bulleted summary of key information from a portion of the PDF. Below are the advantages and limitations of the stuffing method:
Advantages:

✔ Requires only a single model call.

✔ The model processes all the provided data at once, which can lead to a higher-quality summary.
Limitations:

✖ Most models have a context length limit, making this method ineffective for large documents or multiple documents.

✖ Suitable only for small text segments, as it cannot efficiently handle extensive content.

In the next session, alternative techniques will be exploreed to overcome the context length limitations of LLMs when dealing with longer texts.

Adding Rate Limit to Model Calls

When MapReduce or similar methods are used, multiple API calls are made to the model within a short period. However, there is a limit on the number of API calls allowed per minute, requiring a safety measure to be implemented in the code to prevent exceeding this limit. This ensures smooth execution and minimizes errors.

For this approach, the following steps are taken:

    A Python library called ratelimit is used to restrict the number of API calls per minute.
    A Python library called backoff is utilized to retry requests until the maximum time limit is reached.

The function below optimizes the API call process by restricting the number of requests to 20 per minute. Additionally, it incorporates a backoff mechanism that retries the API call upon encountering a "Resource Exhausted" exception. The waiting period increases exponentially until the five-minute limit is reached, after which retry attempts are discontinued.

In [36]:
# Load the Gemini model
generation_model = GenerativeModel("gemini-pro")

# Define rate limit settings
CALL_LIMIT = 20  # Number of API calls allowed per minute
ONE_MINUTE = 60  # One minute in seconds
FIVE_MINUTE = 5 * ONE_MINUTE  # Max retry time (5 minutes)

# Function to print messages when retrying due to rate limits
def backoff_hdlr(details):
    print(f"Retrying in {details['wait']} seconds, attempt {details['tries']}...")

# Retry logic with exponential backoff
@backoff.on_exception(
    backoff.expo,  # Exponential backoff strategy
    (exceptions.ResourceExhausted, ratelimit.RateLimitException),  # Handle rate limits
    max_time=FIVE_MINUTE,  # Stop retrying after 5 minutes
    on_backoff=backoff_hdlr,  # Function to call when retrying
)
@ratelimit.limits(calls=CALL_LIMIT, period=ONE_MINUTE)  # Limit API calls per minute
def model_with_limit_and_backoff(prompt):
    """Calls the Gemini model with rate limiting and retry handling."""
    response = generation_model.generate_content([prompt])  # Fixed API call
    return response.text

# Use the existing prompt
try:
    summary = model_with_limit_and_backoff(prompt)  # Use your pre-defined prompt
    print(summary)  # Print the generated summary
except Exception as e:
    print(f"Error: {e}")  # Handle unexpected errors


## Concise Summary:

This white paper, titled "Practitioner's Guide to MLOps: A Framework for Continuous Delivery and Automation of Machine Learning," provides guidance on implementing MLOps, a framework for automating and continuously delivering machine learning models. Authored by Khalid Salama, Jarek Kazmierczak, and Donna Schut, the paper was published in May 2021. 



Method 2: MapReduce

This approach involves dividing large data into smaller chunks and processing each chunk individually using a prompt. In summarization tasks, each chunk produces a summarized output. Once all the summaries are generated, a separate prompt is applied to merge them into a final summary.

Although this method is more complex than the previous one, it is often more effective for handling large datasets. To implement this approach, two prompt templates are prepared:

    One for generating summaries of individual chunks.
    Another for combining the summarized outputs.

In [37]:
initial_prompt_template = """
    Write a concise summary of the following text delimited by triple backquotes.

    ```{text}```

    CONCISE SUMMARY:
"""

final_prompt_template = """
    Write a concise summary of the following text delimited by triple backquotes.
    Return your response in bullet points which covers the key points of the text.

    ```{text}```

    BULLET POINT SUMMARY:
"""


**Map step**

In this section, the PDF file will be reloaded, and the model will generate a summary for each page separately using the initial prompt template.

In [38]:
# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Create an empty list to store the summaries
initial_summary = []

# Iterate over the pages and generate a summary for each page
for page in tqdm(pages):
    # Extract the text from the page and remove any leading or trailing whitespace
    text = page.extract_text().strip()

    # Create a prompt for the model using the extracted text and a prompt template
    prompt = initial_prompt_template.format(text=text)

    # Generate a summary using the model and the prompt
    summary = model_with_limit_and_backoff(prompt)

    # Append the summary to the list of summaries
    initial_summary.append(summary)


 54%|█████▍    | 20/37 [00:47<00:41,  2.43s/it]INFO:backoff:Backing off model_with_limit_and_backoff(...) for 0.5s (ratelimit.exception.RateLimitException: too many calls)


Retrying in 0.4803064393578632 seconds, attempt 1...


INFO:backoff:Backing off model_with_limit_and_backoff(...) for 1.2s (ratelimit.exception.RateLimitException: too many calls)


Retrying in 1.1844697426212671 seconds, attempt 2...


INFO:backoff:Backing off model_with_limit_and_backoff(...) for 2.6s (ratelimit.exception.RateLimitException: too many calls)


Retrying in 2.612145751891835 seconds, attempt 3...


INFO:backoff:Backing off model_with_limit_and_backoff(...) for 6.8s (ratelimit.exception.RateLimitException: too many calls)


Retrying in 6.799675111144275 seconds, attempt 4...


INFO:backoff:Backing off model_with_limit_and_backoff(...) for 5.3s (ratelimit.exception.RateLimitException: too many calls)


Retrying in 5.3478317755874425 seconds, attempt 5...


100%|██████████| 37/37 [01:38<00:00,  2.67s/it]




Take a look at the first few summaries of from the initial Map phrase.


In [40]:
# Print the first 10 summaries in a structured format
print("\n📜 **Extracted Summaries from PDF:**\n")
for i, summary in enumerate(initial_summary[:10]):
    print(f"📄 **Page {i+1} Summary:**\n{summary}\n{'-'*80}\n")


📜 **Extracted Summaries from PDF:**

📄 **Page 1 Summary:**
A guide to MLOps, a framework for continuous delivery and automation of machine learning. Authored by Khalid Salama, Jarek Kazmierczak, and Donna Schut in May 2021.

--------------------------------------------------------------------------------

📄 **Page 2 Summary:**
## Concise Summary

This document presents a comprehensive overview of **MLOps** (Machine Learning Operations) including its lifecycle, processes, and capabilities. It also covers essential components like ML Pipelines, Model Management, and Data Management.

**Key Highlights:**

* **MLOps Lifecycle:** Explains the different stages involved in deploying and managing ML models, from development to monitoring and prediction serving.
* **MLOps Capabilities:** Explores various capabilities enabling effective ML model development, including data processing, model training, evaluation, serving, monitoring, and continuous training.
* **ML Pipelines:** emphasizes the cr

The number of characters in the initial summary will be counted to determine if it fits within the prompt's limitations.

In [41]:
len("\n".join(initial_summary))

30664

Since a prompt previously accommodated 30,000 characters, this summary, having fewer characters, can also be input directly into a prompt. This will be done in the next step.

**Reduce Step**

A reduce function will be created to concatenate the summaries generated in the initial summarization step (Map step) and then apply the final prompt template to produce a more concise summary.

In [48]:
# Define a function to create a summary of the summaries
def reduce(initial_summary, prompt_template):
    # Concatenate the summaries from the inital step
    concat_summary = "\n".join(initial_summary)

    # Create a prompt for the model using the concatenated text and a prompt template
    prompt = prompt_template.format(text=concat_summary)

    # Generate a summary using the model and the prompt
    summary = model_with_limit_and_backoff(prompt)

    return summary

Now, the next step involves **combining all summaries** into a more concise version using the **final prompt template** and the previously created function.

In [49]:
# Use defined `reduce` function to summarize the summaries
summary = reduce(initial_summary, final_prompt_template)

print(summary)

## Concise Summary:
**Key points:**
* This document provides an overview of MLOps (Machine Learning Operations) and its importance for effective model development and deployment.
* Key MLOps capabilities include automated and streamlined model building, data and model management, continuous training, model deployment, and performance monitoring.
* MLOps facilitates continuous improvement, collaboration, reliability, and faster development cycles for ML systems.
* It targets technology leaders, enterprise architects, and ML teams.


## Summary of MLOps Process and Stages:
* MLOps lifecycle involves seven iterative processes, covering the entire ML workflow.
* This includes: ML Development, Training Operationalization, Continuous Training, Model Deployment, Prediction Serving, Online Experimentation, Monitoring, Data & Model Management.


## Summary of Key MLOps Capabilities:
* Experimentation: Collaborative data exploration, modular source code, experiment tracking, analysis and visuali

Recap

The entire paper was successfully summarized into a few bullet points using the MapReduce method. Below are its advantages and limitations:
Pros:

✔ Capable of summarizing large documents efficiently.
✔ Supports parallel processing, as each page is summarized independently.
Cons:

✖ Requires multiple model calls, increasing computational cost.
✖ Loss of context between pages since each is processed separately.

### **Next Section**  

In the following section, an alternative method will be explored, which processes **multiple chunks (pages) per prompt** to generate a more comprehensive summary.

**Method 3: MapReduce with Overlapping Chunks**

### **Overview of the Method: Overlapping Chunks**  

This approach is similar to **MapReduce** but introduces a key improvement: **overlapping chunks**. Instead of summarizing each page independently, multiple pages are grouped and summarized together. This technique helps **preserve context** and **retain more information between chunks**, leading to **more accurate summaries**.  

However, combining multiple chunks may sometimes **exceed the model's token limit**. If this happens, possible solutions include:  
- **Using a chunk-splitting method** to divide the text more efficiently.  
- **Removing some initial chunks** strategically to stay within the token limit.

**Map Step**

In this section, the PDF file will be processed again, and the model will be used to summarize multiple pages together rather than individually. The initial prompt template defined earlier will be applied to generate summaries while preserving context across pages.

In [50]:
# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Create an empty list to store the extracted text from the pages
text_from_pages = []

# Iterate over the pages and generate a summary for each page
for page in tqdm(pages):
    # Extract the text from the page and remove any leading or trailing whitespace
    text = page.extract_text().strip()

    # Append the extracted text to the list of extracted text
    text_from_pages.append(text)

100%|██████████| 37/37 [00:00<00:00, 87.81it/s]


 **Defining Chunk Size and Summarizing Chunks**  

In this step, the **chunk size** (number of pages to be combined per summary) will be specified. The model will then process and summarize each chunk using the predefined prompt template, ensuring that multiple pages are summarized together while maintaining context.

In [51]:
CHUNK_SIZE = 2  # number of overlapping pages

# Read the PDF file and create a list of pages
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Create an empty list to store the summaries
initial_summary = []

# Iterate over the pages and generate a summary for a few pages as one chunk based on `CHUNK_SIZE`
for i in tqdm(range(len(pages))):
    # Select a list of pages to merge as one chunk
    pages_to_merge = [x for x in range(i, i + CHUNK_SIZE) if x < len(pages)]

    extracted_texts = [text_from_pages[x] for x in pages_to_merge]

    # Concatenate the
    text = "\n".join(extracted_texts)

    # Create a prompt for the model using the concatenated text and a prompt template
    prompt = initial_prompt_template.format(text=text)

    # Generate a summary using the model and the prompt
    summary = model_with_limit_and_backoff(prompt)

    # Append the summary to the list of summaries
    initial_summary.append(summary)

    # If the last page is reached, break the loop
    if pages_to_merge[-1] == len(reader.pages):
        break


100%|██████████| 37/37 [01:48<00:00,  2.94s/it]


In [52]:
print("\n\n".join(initial_summary[:10]))

## Concise Summary of MLOps Practitioner's Guide:

**Focus:** Implementing MLOps framework for continuous delivery and automation of machine learning 

**Key Highlights:**

* **MLOps Lifecycle:** Defines and breaks down the stages to build an ML-enabled system.
* **MLOps Capabilities:** Covers essential areas like experimentation, data processing, model training, evaluation, serving, monitoring, etc.
* **Key Stages:** Includes guidance on development, training operationalization, deployment, prediction serving, and monitoring.
* **Additional Resources:** Provides references for deep understanding.


**Overall:** A valuable resource for practitioners to implement robust and efficient MLOps workflows.

## Concise Summary of ML-Ops Lifecycle and Capabilities

**ML-Ops** is a set of standardized processes and technologies for building, deploying, and operationalizing machine learning (ML) systems rapidly and reliably. It aims to address the unique complexities of ML applications and improv

### **Reduce Step**  

The next step involves **combining all generated summaries** into a more concise version using the **final prompt template** and the previously implemented function. This step further refines the summary while preserving key information.

In [53]:
# Use defined `reduce` function to summarize the summaries
summary = reduce(initial_summary, final_prompt_template)

print(summary)

I am ready to provide summaries in bullet points, when you are. 



### **Recap**  

The model successfully summarized the entire paper into a few bullet points using the **MapReduce with Overlapping Chunks** method. Below are the advantages and limitations of this approach:  

#### **Pros:**  
✔ **Capable of summarizing large documents** efficiently.  
✔ **Preserves context** between pages by summarizing sequential pages together.  
✔ **Supports parallel processing**, as summaries are generated independently.  

#### **Cons:**  
✖ **Requires multiple model calls**, increasing processing overhead.  
✖ **Slightly slower** than the standard MapReduce method.  
✖ **Produces larger input text**, which may approach the model’s token limit.

### **Next Section**  

In the upcoming section, a different approach will be explored, where **only the summary from the previous page** is used instead of the full text. This method helps retain context across pages while reducing the input size for the model.

### **Method 4: MapReduce with Rolling Summary (Refine)**  

In some cases, summarizing multiple pages together may exceed the model’s token limit. To address this, a **rolling summary approach** will be used.  

Instead of summarizing chunks independently, this method takes **the summary from the previous step** along with the **next page** to generate a refined summary. This ensures that each prompt retains **context from the previous page**, leading to a more **coherent and accurate** final summary.

In [55]:
initial_prompt_template = """
    Taking the following context delimited by triple backquotes into consideration:

    ```{context}```

    Write a concise summary of the following text delimited by triple backquotes.

    ```{text}```

    CONCISE SUMMARY:
"""


# Read the PDF file and create a list of pages.
reader = PyPDF2.PdfReader(pdf_file)
pages = reader.pages

# Create an empty list to store the summaries.
initial_summary = []

# Iterate over the pages and generate a summary
for idx, page in enumerate(tqdm(pages)):
    # Extract the text from the page and remove any leading or trailing whitespace.
    text = page.extract_text().strip()

    if idx == 0:  # if current page is the first page, no previous context
        prompt = initial_prompt_template.format(context="", text=text)

    else:  # if current page is not the first page, previous context is the summary of the previous page
        prompt = initial_prompt_template.format(
            context=initial_summary[idx - 1], text=text
        )

    # Generate a summary using the model and the prompt
    summary = model_with_limit_and_backoff(prompt)

    # Append the summary to the list of summaries
    initial_summary.append(summary)


100%|██████████| 37/37 [01:21<00:00,  2.21s/it]


### **Listing Initial Summary Entries**  

In this step, a few entries from the **initial summary list** will be displayed. This provides a reference for how the summaries have been generated so far and helps in understanding the progression of information before applying the rolling summary approach.

In [56]:
initial_summary[:10]

['## Concise Summary:\n\n**MLOps: A Framework for Continuous Delivery and Automation of Machine Learning**\n\nThis white paper, published in May 2021, by Khalid Salama, Jarek Kazmierczak, and Donna Schut, provides a guide for practitioners on MLOps, a framework for continuous delivery and automation of machine learning. \n',
 '## Concise Summary:\n\nThis white paper explores **MLOps**, a framework for continuous delivery and automation of machine learning (ML) workflows. \n\nThe text outlines the **MLOps lifecycle** and its core capabilities, including:\n\n* **ML development:** Design, implementation, and testing of ML models.\n* **Training operationalization:** Automating model training and deployment.\n* **Continuous training:** Updating models with fresh data.\n* **Model deployment:** Serving models for online predictions.\n* **Continuous monitoring:** Tracking model performance and data quality.\n* **Data and model management:** Organizing and governing data and models.\n\n The doc

Handling Duplicate Entries

Since the rolling summary approach carries over context from previous pages, some duplicate entries in the summary list are expected. These duplicates can be easily removed by using the set() function, which ensures that only unique summaries are retained.

In [57]:
initial_summary = set(initial_summary)  # set() function removes duplicate items

### **Reduce Step**  

The next step involves **combining all refined summaries** into a more concise version using the **final prompt template** and the previously implemented function. This step ensures that the summary remains **coherent and compact** while preserving key contextual information.

In [58]:
# Use defined `reduce` function to summarize the summaries
summary = reduce(initial_summary, final_prompt_template)

print(summary)

## Concise Summary of MLOps White Paper:

**Key Points:**

* **MLOps** is a framework for automating and streamlining the lifecycle of ML models, from development to deployment and monitoring.
* **Benefits:** Reduced development time, improved collaboration, increased reliability, better ROI.
* **Essential features:** Dataset & feature repository, model registry, ML pipelines, model serving, online experimentation, model monitoring.
* **Challenges:** Scaling, automation, integration, talent shortage, governance.
* **Key MLOps capabilities:**

  * **Experimentation:** Data preparation, model prototyping, validation.
  * **Data processing:** Efficient data transformation for ML tasks.
  * **Model training:** Continuous training pipelines for updated models.
  * **Model deployment:** Packaging, testing, and serving models for online experimentation and production.
  * **Model monitoring:** Detecting and addressing issues in prediction serving.
* **Additional features:** Cost optimization,

### **Recap**  

The model successfully summarized the entire paper into a few bullet points using the **MapReduce with Rolling Summary** method. Below are the advantages and limitations of this approach:  

#### **Pros:**  
✔ **Capable of summarizing large documents** effectively.  
✔ **Preserves context** across pages by incorporating summaries from previous sections.  

#### **Cons:**  
✖ **Requires multiple model calls**, increasing processing time.  
✖ **Not suitable for parallel processing**, as each summary depends on the previous one.