# Large language model application for hierarchical summarization of scientific papers
### Fedor Sobolevsky, MIPT 2025

This is a notebook containing tests for my bachelor thesis *«Application of Large Language Models for Hierarchical Summarization of Scientific Papers»*. Here I use the text tree edit distance (TTED) metric introduced in this work to assess large language models (LLMs) in application to hierarchical summarization of texts in the form of sentence-based mind maps. The domain of texts I chose to investigate is the scientific domain, but the methods described below are designed to be applicable to any type of texts. 

### Setup

In [19]:
import getpass
import os
import json
from tqdm import tqdm

from langchain.document_loaders import PyPDFLoader
from langmem import create_prompt_optimizer
from langchain_core.messages import HumanMessage, AIMessage

from sentence_transformers import SentenceTransformer, SimilarityFunction

from tted.computation import text_tree_distance
from tted.tree_format import Node, json_to_node, dict_to_node

In [2]:
def read_text(filename: str):
    '''
    Returns text from text file.
    '''
    with open(filename, "r") as file:
        content = file.read()
        return content

In [3]:
def parse_pdf(filename: str):
    '''
    Function for pdf document parsing. Returns the pdf text.
    '''
    loader = PyPDFLoader(filename)
    pages = loader.load()
    text = " ".join([p.page_content for p in pages])

    return text

In [4]:
def extract_json_from_text(text: str):
    '''
    Extract JSON data from model response containing a JSON text tree.
    
    Returns the text tree in JSON format or throws an error if the response format is incorrect.
    '''
    try:
        json_tree = json.loads(text[text.find('{'): text.rfind('}')+1])
        return json_tree
    except Exception as e:
        print('Cannot extract JSON from text:', e)

### Method 1: Direct Prompting

In [7]:
def create_prompt(prompt_file: str, document_pdf: str):
    '''
    Create prompt for LLM from prompt template and article PDF.

    Arguments:
        prompt_file: str - name of text file containing prompt template.
        document_pdf: str - name of PDF file containing article.
    '''
    prompt = read_text(prompt_file) + parse_pdf(document_pdf)

    return prompt

def prompt_model(prompt: str, model):
    '''
    Prompts model with given text prompt.
    '''
    messages = [HumanMessage(content=prompt)]
    response = model.invoke(messages)

    return response

In [8]:
import langchain_mistralai
from langchain_mistralai.chat_models import ChatMistralAI

if "MISTRAL_API_KEY" not in os.environ:
    os.environ["MISTRAL_API_KEY"] = getpass.getpass("Enter your Mistral API key: ")

model = ChatMistralAI(model='mistral-large-latest')

Enter your Mistral API key:  ········


In [9]:
prompt = create_prompt(prompt_file='prompts/human_prompt_0.txt', document_pdf='data/papers/paper_0.pdf')
model_response = prompt_model(prompt=prompt, model=model)

print('Tokens used:', model_response.usage_metadata['total_tokens'])

Tokens used: 16279


In [10]:
tree = dict_to_node(extract_json_from_text(model_response.content))
print(tree)

Hierarchical Summarization: Scaling Up Multi-Document Summarization.
-Multi-document summarization (MDS) systems have been designed for short, unstructured summaries of 10-15 documents, and are inadequate for larger document collections.
--We propose a new approach to scaling up summarization called hierarchical summarization, and present the first implemented system, SUMMA.
---SUMMA produces a hierarchy of relatively short summaries, in which the top level provides a general overview and users can navigate the hierarchy to drill down for more details on topics of interest.
---SUMMA optimizes for coherence as well as coverage of salient information.
---In an Amazon Mechanical Turk evaluation, users preferred SUMMA ten times as often as flat MDS and three times as often as timelines.
-The explosion in the number of documents on the Web necessitates automated approaches that organize and summarize large document collections on a complex topic.
--Existing methods for multi-document summar

In [11]:
reference_tree = json_to_node('data/reference_maps/paper_0.json')
scoring_model = SentenceTransformer('sentence-transformers/paraphrase-distilroberta-base-v1')
def cos_dist(a_embedding, b_embedding):
        return float(1 - scoring_model.similarity(a_embedding, b_embedding))

In [12]:
dist = text_tree_distance(tree, reference_tree, scoring_model.encode, cos_dist)
print(dist)

13.947549819946289


### Method 2: Automatic optimization of human prompts

First, let's create a wrapper for mind map generation and scoring:

In [13]:
def generate_mind_map(prompt, document_text, model, reference_map, scorer):
    '''
    Generate mind map of document using given LLM and compare it with baseline map using scorer.

    Arguments:
        prompt: str - prompt template for mind map generation.
        document_text: str - text of document to be summarized.
        model - language model for mind map generation.
        reference_map: Node - reference mind map for generation scoring.
        scorer - comparison function for comparing the generated map and reference one.

    Output:
        conversation - dict containing user prompt and model response, presumably containing generated mind map.
        feedback - dict containing a score for prompt optimization. Could also contain a comment in case of invalid map format.
    '''
    kTreeExample = '''{
      "A new algorithm for text tree edit distance based on Zhang-Shasha's algorithm and BERT-like model embedding similarity.": {
        "The algorithm's novelty is in its similarity measure based on BERT-like model embeddings.": {
          "Embedding distance is used as a measure of semantic similarity.": {},
          "The language model allows to capture semantic meaning of sentences and model their similarity.": {}
        },
        "Zhang-Shasha's algorithm is used to compute tree edit distance with new edit costs.": {
          "Semantic similarity is used as the update cost in the algorithm.": {},
          "The costs of insertion and removal of nodes are defined as the similarity of the node and an empty sentence.": {}
        },
        "The proposed algorithm is presented as a more informative metric of similarity between text trees.": {
          "The current ways of comparing text trees overlook overlook their tree structure or the meaning of their labels.": {},
          "This new method can be used, for example, to compare mind maps or hierarchical summaries.": {}
        }
      }
    }'''
    
    full_prompt = prompt + document_text
    model_response = prompt_model(full_prompt, model).content

    conversation = [
        {'role': 'user', 'content': full_prompt},
        {'role': 'assistant', 'content': model_response},
    ]

    tree = dict_to_node(extract_json_from_text(model_response))
    if tree == None:
        feedback = {
            'score': -100000,
            'comment': 'Invalid format: response should contain one text tree in the JSON format as in following example: '+kTreeExample,
        }
    else:
        score = scorer(tree, reference_map)
        feedback = {'score': score}

    return conversation, feedback

In [14]:
prompt_templates = [read_text('prompts/human_prompt_'+str(i)+'.txt') for i in range(4)]
document_text = parse_pdf('data/papers/paper_0.pdf')
trajectories = []

model = ChatMistralAI(model='mistral-large-latest')

reference_tree = json_to_node('data/reference_maps/paper_0.json')
scoring_model = SentenceTransformer('sentence-transformers/paraphrase-distilroberta-base-v1')

def cos_dist(a_embedding, b_embedding):
        return float(1 - scoring_model.similarity(a_embedding, b_embedding))

def scorer(tree_a, tree_b):
    return -text_tree_distance(tree_a, tree_b, scoring_model.encode, cos_dist)

for prompt in tqdm(prompt_templates):
    conversation, feedback = generate_mind_map(prompt, document_text, model, reference_tree, scorer)

    trajectories.append((conversation, feedback))

100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [01:05<00:00, 16.36s/it]


In [16]:
optimizer = create_prompt_optimizer(
    "mistral-large-latest",
    kind="gradient",
    config={"max_reflection_steps": 5, "min_reflection_steps": 0},
)

updated = optimizer.invoke({"trajectories": trajectories, "prompt": prompt_templates[0]})
updated

'Generate a sentence-based mind map of the given scientific paper. Output it in the JSON format as in the following example:\n{\n  "A new algorithm for text tree edit distance based on Zhang-Shasha\'s algorithm and BERT-like model embedding similarity.": {\n    "The algorithm\'s novelty is in its similarity measure based on BERT-like model embeddings.": {\n      "Embedding distance is used as a measure of semantic similarity.": {},\n      "The language model allows to capture semantic meaning of sentences and model their similarity.": {}\n    },\n    "Zhang-Shasha\'s algorithm is used to compute tree edit distance with new edit costs.": {\n      "Semantic similarity is used as the update cost in the algorithm.": {},\n      "The costs of insertion and removal of nodes are defined as the similarity of the node and an empty sentence.": {}\n    },\n    "The proposed algorithm is presented as a more informative metric of similarity between text trees.": {\n      "The current ways of compari

In [17]:
prompt = updated + parse_pdf('data/papers/paper_0.pdf')
model_response = prompt_model(prompt=prompt, model=model)
tree = dict_to_node(extract_json_from_text(model_response.content))
score = -text_tree_distance(tree, reference_tree, scoring_model.encode, cos_dist)

In [18]:
print(tree)
print(score)

Hierarchical Summarization: Scaling Up Multi-Document Summarization.
-Multi-document summarization (MDS) systems have been designed for short, unstructured summaries of 10-15 documents, and are inadequate for larger document collections.
--We propose a new approach to scaling up summarization called hierarchical summarization, and present the first implemented system, SUMMA.
---SUMMA produces a hierarchy of relatively short summaries, in which the top level provides a general overview and users can navigate the hierarchy to drill down for more details on topics of interest.
---SUMMA optimizes for coherence as well as coverage of salient information.
---In an Amazon Mechanical Turk evaluation, users preferred SUMMA ten times as often as flat MDS and three times as often as timelines.
-The explosion in the number of documents on the Web necessitates automated approaches that organize and summarize large document collections on a complex topic.
--Existing methods for multi-document summar

Why does this method yield suboptimal results? The ProTeGi prompt optimization model (Pryzant et al., 2023) uses explicit feedback to generate textual gradients, while the only feedback we can provide automatically in our task is numerical scores or a pre-written message about invalid tree format.

### Method 3: Consecutive prompting and map construction

The idea behind this method is letting the user interact with the intelligent system and decide which details he would like to add to his map of the paper.

In [20]:
main_prompt = f'''Study the following paper:
{document_text}
Write out the main idea of the paper in one sentence. Output only this sentence.'''
chat_history = [HumanMessage(content=main_prompt)]
main_idea = prompt_model(main_prompt, model).content

In [21]:
print(main_idea)

'The paper introduces a novel approach to large-scale multi-document summarization called hierarchical summarization, which organizes information in a coherent hierarchy and allows users to navigate and explore details based on their interests.'

In [27]:
details_prompt = '''Now, given the paper and its main idea, write 6 questions asking for details. 
Enumerate the questions and output only them.'''
chat_history += [AIMessage(content=main_idea), HumanMessage(content=details_prompt)]
questions = model.invoke(chat_history).content

In [28]:
print(questions)

1. What are the key characteristics of hierarchical summarization that make it different from traditional multi-document summarization?
2. How does the SUMMA system implement hierarchical summarization, and what methodologies does it use?
3. What are the metrics used to evaluate the quality of a hierarchical summary, and how are they defined?
4. How does the SUMMA system ensure coherence between parent and child summaries in the hierarchy?
5. What are the results of the user study comparing SUMMA with timelines and flat multi-document summaries?
6. What are the future directions and improvements planned for the SUMMA system and hierarchical summarization?


In [29]:
selected_questions = input('Input the numbers of questions you would like to get answers to: ')
question_answering_prompt = f'''Answer questions {selected_questions} according to the paper. 
Each answer should be one sentence long. Output only the answers without enumeration, each from a new line.'''
chat_history += [AIMessage(content=questions), HumanMessage(content=question_answering_prompt)]
answers = model.invoke(chat_history).content

Input the numbers of questions you would like to get answers to:  1, 2, 5


In [30]:
print(answers)

Hierarchical summarization organizes information along principles such as time, location, entities, or events, allowing users to navigate from a general overview to more detailed child summaries.
The SUMMA system implements hierarchical summarization by first clustering sentences temporally and then summarizing these clusters using an objective function that optimizes for salience and coherence.
The user study found that participants preferred SUMMA over timelines three times as often and over flat summaries ten times as often, and they learned just as much from SUMMA as from flat summaries.


In [31]:
more_details_prompt = '''Now, given the paper and these points, write 6 questions for each point asking for details. 
Enumerate the questions and output only them.'''
chat_history += [AIMessage(content=answers), HumanMessage(content=more_details_prompt)]
more_questions = model.invoke(chat_history).content

In [32]:
print(more_questions)

### Point 1: Hierarchical Summarization Characteristics

1. How does hierarchical summarization ensure that the information presented at the start is small and grows only as the user directs it?
2. What specific mechanisms does hierarchical summarization use to allow users to tailor the output to their interests?
3. How does the organization of summaries along principles like time, location, entities, or events contribute to the coherence of the information?
4. What are the advantages of hierarchical summarization over traditional flat summaries in terms of user navigation and exploration?
5. How does the hierarchical structure help in managing large document collections without overwhelming the user?
6. What are the different organizing principles that can be used in hierarchical summarization, and how are they selected for different portions of the hierarchy?

### Point 2: SUMMA System Implementation

1. What is the role of temporal clustering in the SUMMA system, and how is it imple

In [33]:
selected_questions = input('Input the numbers of questions you would like to get answers to: ')
question_answering_prompt = f'''Answer questions {selected_questions} according to the paper. 
Each answer should be one sentence long. Output only the answers without enumeration, each from a new line.'''
chat_history += [AIMessage(content=more_questions), HumanMessage(content=question_answering_prompt)]
answers = model.invoke(chat_history).content

Input the numbers of questions you would like to get answers to:  1.4, 2.3, 2.5, 2.6, 3.1, 3.2, 3.6


In [34]:
print(answers)

Hierarchical summarization allows users to start with a general overview and drill down into more detailed information on topics of interest, making it easier to manage and understand large document collections.
The SUMMA system uses an objective function that balances salience and coherence, treating redundancy and budget as hard constraints, to generate high-quality summaries.
The process of hierarchical clustering in the SUMMA system involves recursively clustering sentences based on temporal information, using a method that automatically chooses the appropriate number of clusters at each split.
The SUMMA system summarizes within the hierarchy by following the hierarchical structure of the clustering, where each node has an associated flat summary, and the number of sentences in a flat summary is equal to the number of child clusters of the node.
The user study evaluated user preference by asking participants to choose which format they preferred (SUMMA, timelines, or flat summaries

In [35]:
generated_map = dict_to_node({
    'The paper introduces a novel approach to large-scale multi-document summarization called hierarchical summarization, which organizes information in a coherent hierarchy and allows users to navigate and explore details based on their interests.': {
        'Hierarchical summarization organizes information along principles such as time, location, entities, or events, allowing users to navigate from a general overview to more detailed child summaries.': {
            'Hierarchical summarization allows users to start with a general overview and drill down into more detailed information on topics of interest, making it easier to manage and understand large document collections.': {}
        },
        'The SUMMA system implements hierarchical summarization by first clustering sentences temporally and then summarizing these clusters using an objective function that optimizes for salience and coherence.': {
            'The SUMMA system uses an objective function that balances salience and coherence, treating redundancy and budget as hard constraints, to generate high-quality summaries.': {},
            'The process of hierarchical clustering in the SUMMA system involves recursively clustering sentences based on temporal information, using a method that automatically chooses the appropriate number of clusters at each split.': {},
            'The SUMMA system summarizes within the hierarchy by following the hierarchical structure of the clustering, where each node has an associated flat summary, and the number of sentences in a flat summary is equal to the number of child clusters of the node.': {}
        },
        'The user study found that participants preferred SUMMA over timelines three times as often and over flat summaries ten times as often, and they learned just as much from SUMMA as from flat summaries.': {
            'The user study evaluated user preference by asking participants to choose which format they preferred (SUMMA, timelines, or flat summaries) and to explain why.': {},
            'The user study measured the effectiveness of SUMMA in helping users learn about complex topics by asking participants to write a paragraph summarizing the information they learned after reading the summaries.': {},
            'The user study compared the informativeness and coherence of SUMMA summaries with those of timelines and flat summaries by using ROUGE scores for coverage and manual evaluation for recall of important events.': {}
        }
    }
})
score = -text_tree_distance(generated_map, reference_tree, scoring_model.encode, cos_dist)

In [36]:
print(score)

-3.9009242057800293


This approach looks more promising in terms of generating a map closer to what the user would have made by himself. Let's try to automate it:

In [75]:
# First prompt to make the LLM familiar with the text, explain the task and generate the main idea (root node of the summary).
def start_prompt(document_text: str):
    return f'''Carefully study the following research paper:
    {document_text}
    This is the end of the paper. Now we are going to make a hierarchical summary of the paper.
    The purpose of this summary is to systematize the research study, going from key points to more specific details.
    The result should be a hierarchical list of one-sentence points based on the paper. They shouldn't necessarily be sentences from the text.
    Organize the summary by enumerating it like in the following example:
    This is the main idea.
    1. This is key point number 1.
    1.1. This is a detail for point number 1.
    1.2. This is another detail for point number 1.
    2. This is another key point.
    2.1 This is a detail for point number 2.
    We are going to generate the summary iteratively the following way: 
    - You write out the main idea of the text and generate questions aimed to provide details to the main idea.
    - The user picks the questions he wants to be answered.
    - You answer the questions, each in one sentence, and add them to the summary under the main idea as the first level of hierarchy in the summary.
    - The user picks a point he wants to get details on.
    - You generate questions aimed to provide details for this point.
    - The user picks the questions that interest them.
    - You answer each question in one sentence and add these sentences to the summary under the point which you generated questions for.
    - The process repeats until the user tells you that the summary is complete.
    Let's start now. Write out the main idea of the paper in one sentence. Output only this sentence.'''


# Prompt to generate questions for the second layer of the summary.
main_idea_questions_prompt = '''Now, generate 6 questions aimed to add details to the main idea.
    Enumerate the questions and output only them.'''


# Prompt to generate questions for point with given number.
def point_questions_prompt(point_number):
    return f'''Generate 6 questions aimed to add details to point number {point_number}.
    Enumerate the questions and output only them.'''


# Prompt to answer questions with given numbers.
def answer_questions_prompt(selected_questions):
    return f'''Now answer questions {selected_questions} and add the answers to them to the summary.
    Add them under the point for which you generated the questions previously.
    The answers to each question should be only one sentence long and add only one point to our summary. 
    Make sure ALL the point we are adding to summary right now are on ONE level of hierarchy.
    Output the whole summary we have thus far made and only it. Make sure the enumeration in the summary is as in the example I provided.'''


# Last prompt to convert the mind map to the format we are working with.
convert_map_prompt = '''We have finished building our summary. Now your job is to convert it to a JSON text tree.
Output it in the JSON format as in the following example:
{
  "A new algorithm for text tree edit distance based on Zhang-Shasha's algorithm and BERT-like model embedding similarity.": {
    "The algorithm's novelty is in its similarity measure based on BERT-like model embeddings.": {
      "Embedding distance is used as a measure of semantic similarity.": {},
      "The language model allows to capture semantic meaning of sentences and model their similarity.": {}
    },
    "Zhang-Shasha's algorithm is used to compute tree edit distance with new edit costs.": {
      "Semantic similarity is used as the update cost in the algorithm.": {},
      "The costs of insertion and removal of nodes are defined as the similarity of the node and an empty sentence.": {}
    },
    "The proposed algorithm is presented as a more informative metric of similarity between text trees.": {
      "The current ways of comparing text trees overlook overlook their tree structure or the meaning of their labels.": {},
      "This new method can be used, for example, to compare mind maps or hierarchical summaries.": {}
    }
  }
}
Output only the mind map we made in this format.'''


# This prompt is for cases when the model outputs the mind map in the wrong format
wrong_format_prompt = '''It seems that you have output the map in the wrong format. Take a look at the example again:
{
  "A new algorithm for text tree edit distance based on Zhang-Shasha's algorithm and BERT-like model embedding similarity.": {
    "The algorithm's novelty is in its similarity measure based on BERT-like model embeddings.": {
      "Embedding distance is used as a measure of semantic similarity.": {},
      "The language model allows to capture semantic meaning of sentences and model their similarity.": {}
    },
    "Zhang-Shasha's algorithm is used to computqe tree edit distance with new edit costs.": {
      "Semantic similarity is used as the update cost in the algorithm.": {},
      "The costs of insertion and removal of nodes are defined as the similarity of the node and an empty sentence.": {}
    },
    "The proposed algorithm is presented as a more informative metric of similarity between text trees.": {
      "The current ways of comparing text trees overlook overlook their tree structure or the meaning of their labels.": {},
      "This new method can be used, for example, to compare mind maps or hierarchical summaries.": {}
    }
  }
}
This is the map in the JSON format we should get from the following hierarchical summary:
A new algorithm for text tree edit distance based on Zhang-Shasha's algorithm and BERT-like model embedding similarity.
1. The algorithm's novelty is in its similarity measure based on BERT-like model embeddings.
1.1. Embedding distance is used as a measure of semantic similarity.
1.2. The language model allows to capture semantic meaning of sentences and model their similarity.
2. Zhang-Shasha's algorithm is used to compute tree edit distance with new edit costs.
2.1. Semantic similarity is used as the update cost in the algorithm.
2.2. The costs of insertion and removal of nodes are defined as the similarity of the node and an empty sentence.
3. The proposed algorithm is presented as a more informative metric of similarity between text trees.
3.1. The current ways of comparing text trees overlook overlook their tree structure or the meaning of their labels.
3.2. This new method can be used, for example, to compare mind maps or hierarchical summaries.

Now, given the example, output the JSON map in the correct format.'''

In [76]:
def generate_summary_iteratively(document_text, model):
    '''
    Generate hierarchical summary of given text by iteratively prompting given LLM. 
    '''
    chat = [HumanMessage(content=start_prompt(document_text))]

    main_idea = model.invoke(chat).content
    print(main_idea)
    
    chat += [
        AIMessage(content=main_idea),
        HumanMessage(content=main_idea_questions_prompt)
    ]

    main_idea_questions = model.invoke(chat).content
    chat.append(AIMessage(content=main_idea_questions))
    print(main_idea_questions)

    while(True):
        selected_questions = input('Select questions to answer: ')
        if selected_questions == 'q':
            break
        chat.append(HumanMessage(content=answer_questions_prompt(selected_questions)))

        current_map = model.invoke(chat).content
        chat.append(AIMessage(content=current_map))
        print(current_map)

        selected_point = input('Select point to generate questions for: ')
        if selected_point == 'q':
            break
        chat.append(HumanMessage(content=point_questions_prompt(selected_point)))
        
        new_questions = model.invoke(chat).content
        chat.append(AIMessage(content=new_questions))
        print(new_questions)

    chat.append(HumanMessage(content=convert_map_prompt))

    conversion_successful = False
    while not conversion_successful:
        converted_map = model.invoke(chat).content
        chat.append(AIMessage(content=converted_map))
        
        json = extract_json_from_text(converted_map)
        if json is not None:
            try:
                tree = dict_to_node(json)
                conversion_successful = True
            except Exception:
                pass

        if not conversion_successful:
            print('Conversion unsuccessful, trying again...')
            chat.append(HumanMessage(content=wrong_format_prompt))

    return tree

In [77]:
generated_map = generate_summary_iteratively(document_text, model)

The paper introduces hierarchical summarization, a novel approach to multi-document summarization that organizes large document collections coherently.
1. What is the main problem with existing multi-document summarization systems that hierarchical summarization aims to solve?
2. What are the key characteristics of hierarchical summarization?
3. How does the SUMMA system, the first implementation of hierarchical summarization, operate?
4. What are the main evaluation criteria used to assess the effectiveness of hierarchical summarization in the user study?
5. How does hierarchical summarization compare to traditional multi-document summarization and timelines in terms of user preference and knowledge acquisition?
6. What are the future directions for improving and expanding the hierarchical summarization approach?


Select questions to answer:  2, 3, 5


The paper introduces hierarchical summarization, a novel approach to multi-document summarization that organizes large document collections coherently.
1. Hierarchical summarization organizes summaries along principles such as time, location, entities, or events, with each non-leaf summary linked to child summaries for more details.
2. The SUMMA system, the first implementation of hierarchical summarization, operates by hierarchically clustering sentences by time and summarizing these clusters using an objective function that optimizes salience and coherence.
3. In a user study, hierarchical summarization was preferred over traditional multi-document summarization and timelines, with users learning just as much or more from hierarchical summaries.


Select point to generate questions for:  2


1. What is the first step in the SUMMA system's process for hierarchical summarization?
2. How does the SUMMA system determine the hierarchical structure of the summaries?
3. What objective function does the SUMMA system use to optimize salience and coherence?
4. What method does the SUMMA system use to ensure temporal coherence in the summaries?
5. How does the SUMMA system handle the selection of the number of clusters at each level of the hierarchy?
6. What algorithm does the SUMMA system use to approximate the optimization of the objective function?


Select questions to answer:  2, 3


The paper introduces hierarchical summarization, a novel approach to multi-document summarization that organizes large document collections coherently.
1. Hierarchical summarization organizes summaries along principles such as time, location, entities, or events, with each non-leaf summary linked to child summaries for more details.
2. The SUMMA system, the first implementation of hierarchical summarization, operates by hierarchically clustering sentences by time and summarizing these clusters using an objective function that optimizes salience and coherence.
2.1. The SUMMA system uses hierarchical clustering to organize sentences into manageable, semantically-related sections, inducing a hierarchical structure over the input.
2.2. The SUMMA system's objective function balances salience and coherence, treating redundancy and budget as hard constraints while optimizing for coherence and salience as soft constraints.
3. In a user study, hierarchical summarization was preferred over tradi

Select point to generate questions for:  3


1. What specific metrics were used to evaluate the effectiveness of hierarchical summarization in the user study?
2. How did the user study compare hierarchical summarization to traditional multi-document summarization and timelines?
3. What were the key findings regarding user preference between hierarchical summarization and traditional methods?
4. How was knowledge acquisition assessed in the user study for hierarchical summarization compared to other methods?
5. What were the results of the ROUGE evaluation for hierarchical summarization compared to other methods?
6. How did the manual evaluation of event recall compare between hierarchical summarization and other methods?


Select questions to answer:  1, 3, 4


The paper introduces hierarchical summarization, a novel approach to multi-document summarization that organizes large document collections coherently.
1. Hierarchical summarization organizes summaries along principles such as time, location, entities, or events, with each non-leaf summary linked to child summaries for more details.
2. The SUMMA system, the first implementation of hierarchical summarization, operates by hierarchically clustering sentences by time and summarizing these clusters using an objective function that optimizes salience and coherence.
2.1. The SUMMA system uses hierarchical clustering to organize sentences into manageable, semantically-related sections, inducing a hierarchical structure over the input.
2.2. The SUMMA system's objective function balances salience and coherence, treating redundancy and budget as hard constraints while optimizing for coherence and salience as soft constraints.
3. In a user study, hierarchical summarization was preferred over tradi

Select point to generate questions for:  q


In [78]:
print(generated_map)

The paper introduces hierarchical summarization, a novel approach to multi-document summarization that organizes large document collections coherently.
-Hierarchical summarization organizes summaries along principles such as time, location, entities, or events, with each non-leaf summary linked to child summaries for more details.
-The SUMMA system, the first implementation of hierarchical summarization, operates by hierarchically clustering sentences by time and summarizing these clusters using an objective function that optimizes salience and coherence.
--The SUMMA system uses hierarchical clustering to organize sentences into manageable, semantically-related sections, inducing a hierarchical structure over the input.
--The SUMMA system's objective function balances salience and coherence, treating redundancy and budget as hard constraints while optimizing for coherence and salience as soft constraints.
-In a user study, hierarchical summarization was preferred over traditional multi

In [79]:
score = -text_tree_distance(generated_map, reference_tree, scoring_model.encode, cos_dist)
print(score)

-3.3494465947151184
