# Document Summarization

This notebook demonstrates an application of long document summarization techniques to a work of literature.

## Install Dependencies

Granite Kitchen comes with a bundle of dependencies that are required for notebooks. See the list of packages in its [`setup.py`](https://github.com/ibm-granite-community/granite-kitchen/blob/main/setup.py). 

In [None]:
! pip install git+https://github.com/ibm-granite-community/granite-kitchen \
    transformers \
    torch

## Select your model

Select a Granite Code model from the [`ibm-granite`](https://replicate.com/ibm-granite) org on Replicate. Here we use the Replicate Langchain client to connect to the model.

To get set up with Replicate, see [Getting Started with Replicate](https://github.com/ibm-granite-community/granite-kitchen/blob/main/recipes/Getting_Started/Getting_Started_with_Replicate.ipynb).

To connect to a model on a provider other than Replicate, substitute this code cell with one from the [LLM component recipe](https://github.com/ibm-granite-community/granite-kitchen/blob/main/recipes/Components/Langchain_LLMs.ipynb).

In [None]:
from langchain_community.llms import Replicate
from ibm_granite_community.notebook_utils import get_env_var

model = Replicate(
    model="ibm-granite/granite-8b-code-instruct-128k",
    replicate_api_token=get_env_var('REPLICATE_API_TOKEN'),
)

## Download a book

Here we fetch H.D. Thoreau's "Walden" from [Project Gutenberg](https://www.gutenberg.org/) for summarization.

We have to trim it down so that it will fit in the 128k-token context window of the model.

In [None]:
import requests
from time import sleep

# The following URL contains a text version of H.D. Thoreau's "Walden"
url = "https://www.gutenberg.org/cache/epub/205/pg205.txt"

# Get the contents
response = requests.get(url)
response.raise_for_status()
contents = response.text

# Extract the text of the book, leaving out the gutenberg boilerplate.
start_index = contents.index("*** START OF THE PROJECT GUTENBERG EBOOK WALDEN, AND ON THE DUTY OF CIVIL DISOBEDIENCE ***")
end_index = contents.find("*** END OF THE PROJECT GUTENBERG EBOOK WALDEN, AND ON THE DUTY OF CIVIL DISOBEDIENCE ***")
contents = contents[start_index:end_index]
print("Length of book text: {} chars".format(len(contents)))

# We limit the text to 200k characters, which is about 57k tokens. (400k chars is ~114k tokens; 300k chars is ~86k tokens; 350k chars is ~100k tokens).
char_limit = 200000
contents = contents[:char_limit]
print("Length of text for summarization: {} chars".format(len(contents)))

## Count the tokens

Before sending our code to the AI model, it's crucial to understand how much of the model's capacity we're using. Language models typically have a limit on the number of tokens they can process in a single request.

Key points:
- We're using the `granite-8B-Code-instruct-128k` model, which has a context window of 128,000 tokens
- The context window includes both the input (the book text) and the output (the summary)
- Tokenization can vary between models, so we use the specific tokenizer for our chosen model

Understanding token count helps us optimize our prompts and ensure we're using the model efficiently.

In [None]:
from transformers import AutoTokenizer

model_path = "ibm-granite/granite-8B-Code-instruct-128k"
tokenizer = AutoTokenizer.from_pretrained(model_path)

print(f"Your document has has {len(tokenizer(contents, return_tensors='pt')['input_ids'][0])} tokens. ")

## Summarize the text

We construct our final prompt and send it to the AI model on Replicate for processing.

In [None]:
prompt = f"""
Summarize the following text:
{contents}
"""

output = model.invoke(
    prompt,
    model_kwargs={
        "max_tokens": 10000,
        "min_tokens": 0,
        "temperature": 0.75,
        "system_prompt": "You are a helpful assistant.",
        "presence_penalty": 0,
        "frequency_penalty": 0
    }
    )

print(output)