## 从Arxiv加载论文并进行摘要
Arxiv网站上一篇《Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference》英文论文，其论文编号为：2501.12959。示例尝试加载这篇论文，并对其内容进行中文摘要。

In [20]:
from pprint import pprint
from langchain_community.document_loaders import ArxivLoader
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.text_splitter import RecursiveCharacterTextSplitter
from chat_model_client import get_model

1. 加载论文

In [21]:
loader = ArxivLoader(query="2501.12959", load_max_docs=1)
docs = loader.load()
pprint(docs[0])

Document(metadata={'Published': '2025-01-22', 'Title': 'Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference', 'Authors': 'Weizhi Fei, Xueyan Niu, Guoqing Xie, Yingqing Liu, Bo Bai, Wei Han', 'Summary': 'Although applications involving long-context inputs are crucial for the\neffective utilization of large language models (LLMs), they also result in\nincreased computational costs and reduced performance. To address this\nchallenge, we propose an efficient, training-free prompt compression method\nthat retains key information within compressed prompts. We identify specific\nattention heads in transformer-based LLMs, which we designate as evaluator\nheads, that are capable of selecting tokens in long inputs that are most\nsignificant for inference. Building on this discovery, we develop EHPC, an\nEvaluator Head-based Prompt Compression method, which enables LLMs to rapidly\n"skim through" input prompts by leveraging only the first few layers with\neval

2. 文本分割
（直接按字符串长度分割，也可使用token长度等）

In [22]:
spliter = RecursiveCharacterTextSplitter(chunk_size=256, chunk_overlap=2)
texts = spliter.split_documents(docs)
# pprint(texts)

3. 构建链

In [23]:
doc_prompt = PromptTemplate.from_template("{page_content}")
#文本拼接
content = lambda docs: "\n\n".join(doc.page_content for doc in docs)   
prompt = PromptTemplate.from_template("请使用中文总结以下内容，控制在140个字以内：\n\n{content}")
llm = get_model('openai')

# 链
chain = (
    {"content": lambda docs: content(docs)}
    | prompt
    | llm
    | StrOutputParser()
)

pprint(chain.invoke(texts[:50]))


  llm = get_model('openai')


'本文提出了一种高效的、无需训练的提示压缩方法EHPC，通过评估头部在长文本输入中选择最重要的令牌，从而加速长文本推理。EHPC在两个主流基准测试中取得了最先进的结果，有效降低了商业API调用的复杂性和成本。与基于键值缓存的加速方法相比，EHPC具有竞争力，有望提高LLM在长文本任务中的效率。EHPC通过评估头部选择重要令牌，加速长文本推理，降低内存使用，并与KV缓存压缩方法竞争。EHPC在提示压缩基准测试上取得了新的最先进性能，降低了商业LLM的API成本和内存使用。'


4. 链路检查

In [24]:
pprint(chain.get_graph().print_ascii())

+------------------------+ 
| Parallel<content>Input | 
+------------------------+ 
             *             
             *             
             *             
        +--------+         
        | Lambda |         
        +--------+         
             *             
             *             
             *             
    +----------------+     
    | PromptTemplate |     
    +----------------+     
             *             
             *             
             *             
      +------------+       
      | ChatOpenAI |       
      +------------+       
             *             
             *             
             *             
    +-----------------+    
    | StrOutputParser |    
    +-----------------+    
             *             
             *             
             *             
+-----------------------+  
| StrOutputParserOutput |  
+-----------------------+  
None
