# Text summary

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/fuyu-quant/langchain/blob/main/examples/text_summarization.ipynb)

In [4]:
%%capture
!pip install langchain
!pip install openai
!pip install tiktoken

In [5]:
from langchain.chains.summarize import load_summarize_chain

from langchain.text_splitter import CharacterTextSplitter
from langchain.docstore.document import Document

from langchain.llms import OpenAI

In [6]:
file_path = "/home/jovyan/langchain/data/sample.txt"


#  データの用意
with open(file_path) as f:
    dragonball_txt = f.read()
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
texts = text_splitter.split_text(dragonball_txt)

docs = [Document(page_content=t) for t in texts[:2]]

In [10]:
llm = OpenAI(temperature=0, model_name="text-davinci-002")

### Stuff
* 全ての関連データをコンテキストとしてプロンプトに詰め込み，言語モデルに渡す手法
    * メリット：LLMへの呼び出しは一回のみになり，テキスト生成時にLLMは一度に全てのデータを参照できる
    * デメリット：LLMのコンテキストの長さによる制限により大きなデータでは機能しない

In [12]:
# stuffのload_summarize_chainを準備
chain = load_summarize_chain(llm, chain_type="stuff")

# 要約の実行
print(chain.run(docs))





Kaggle is a company that hosts competitions for data scientists to build models that solve problems. The company has a large number of employees who are passionate about data analysis.

The Kaggle Meet up is an event that was organized for employees to interact with each other and learn more about the company. The event was held both offline and online to accommodate for different preferences.

Despite the bad weather, approximately 30 people attended the offline event, and around 90 people attended the online event. The event was a success, with employees and new graduates enjoying the talks and networking with each other.


### Map Reduce
* 関連データをチャンクに分割し，チャンクごとにプロンプトを作り言語モデルに渡す．その後，それらの結果を結合するプロンプトを言語モデルに渡す
    * メリット：Stuffingより大きなデータが扱える．チャンクのLLMの呼び出しを並列実行できる．
    * デメリット：Stuffingより多くの回数のLLMの呼び出しが必要になる．また最後の結合により一部のデータを失う.チャットモデルではまだ対応していない


In [11]:
# map_reduceのload_summarize_chainを準備
chain = load_summarize_chain(llm, chain_type="map_reduce")

# 要約の実行
print(chain.run(docs))





Kaggle is a company that hosts competitions for data scientists to build the best machine learning models to solve real-world problems. Kaggle has over 200 data scientists on staff, and many more Kaggle users who are passionate about data analysis.

The Kaggle Meet up is an event for employees to learn more about Kaggle and data science competitions. The event was held both offline and online to accommodate different interests and schedules.

Despite the bad weather, approximately 30 people gathered in the seminar room, and 90 employees total participated in the event online, including new graduates. Some participants came from customer sites to the headquarters office, looking forward to the networking event.

The event started with a series of lightning talks by 4 people with a wide range of Kaggle experience, from 1 year to Kaggle master. Topics included motivations for participating in Kaggle, how to choose competitions, results, how to approach Kaggle, what was gained from parti