<a href="https://colab.research.google.com/github/maninog/langchain/blob/main/LangChain_Chains_Long_Text_QA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LangChain Chains（Long-Text-QA）

## README
- author: [Masumi Morishige](https://twitter.com/masumi_creator)
- created_at: 2023-05-04
- updated_at: 2023-06-28

### 実行方法
1. OpenAIのAPIキーを発行
2. `os.environ["OPENAI_API_KEY"] = "..."`の`""`の中にご自身のAPIキーを代入
3. 「ランタイム > すべてのセルを実行」を実行

### 参考情報
- Zenn: [制作中]
- YouTube: [文字数制限のないFAQチャットボットの実装方法【Python / LangChain / ChatGPT】](https://youtu.be/MoQcV4s7hQw)

### OpenAI APIの発行方法

[<img src="https://img.youtube.com/vi/frpsKLNW1q4/maxresdefault.jpg" width="600px">](https://youtu.be/frpsKLNW1q4)

[【エンジニア向け】OpenAIのAPI連携方法【環境構築 + GASによるGoogle Documentへの組み込み】](https://youtu.be/frpsKLNW1q4)

## 環境構築

In [None]:
!pip install langchain==0.0.149

In [None]:
!pip install openai==0.27.6

In [None]:
import os

#TODO: APIキーの登録が必要
os.environ["OPENAI_API_KEY"] = "..."

In [None]:
!pip install youtube-transcript-api==0.6.1

In [None]:
!pip install tiktoken==0.4.0

## 実装方法

### 1. map_reduce

In [None]:
from langchain.chat_models import ChatOpenAI

from langchain.document_loaders import YoutubeLoader

from langchain.text_splitter import CharacterTextSplitter

from langchain.docstore.document import Document
from langchain.chains.question_answering import load_qa_chain

import time
start_time = time.time()

youtube_url = "https://www.youtube.com/watch?v=TQvaocfmvaI" # YouTubeを学習したChatGPTの実装方法【Python / LangChain / YouTube】
loader = YoutubeLoader.from_youtube_url(youtube_url, language="ja")
transcript_text = loader.load()[0].page_content
print(f"{transcript_text = }")
print(f"{len(transcript_text) = }")

text_splitter = CharacterTextSplitter(separator=" ", chunk_size=500)
texts = text_splitter.split_text(transcript_text)

print(f"{len(texts) =}")

docs = [Document(page_content=t) for t in texts]

chat = ChatOpenAI(model_name="gpt-3.5-turbo")
chain = load_qa_chain(
    llm=chat,
    chain_type="map_reduce",
    verbose=True,
)

question = "YouTubeを学習したChatGPTを実装するために、インストールが必要なライブラリを教えて。"

output = chain(
    {
        "input_documents": docs,
        "question": question,
    },
    return_only_outputs=True,
)["output_text"]
print(output)

print(f"{time.time() - start_time}")


### 2. map_rerank

In [None]:
from langchain.chat_models import ChatOpenAI

from langchain.document_loaders import YoutubeLoader

from langchain.text_splitter import CharacterTextSplitter

from langchain.docstore.document import Document
from langchain.chains.question_answering import load_qa_chain

import time
start_time = time.time()

youtube_url = "https://www.youtube.com/watch?v=TQvaocfmvaI" # YouTubeを学習したChatGPTの実装方法【Python / LangChain / YouTube】
loader = YoutubeLoader.from_youtube_url(youtube_url, language="ja")
transcript_text = loader.load()[0].page_content
print(f"{transcript_text = }")
print(f"{len(transcript_text) = }")

text_splitter = CharacterTextSplitter(separator=" ", chunk_size=500)
texts = text_splitter.split_text(transcript_text)

print(f"{len(texts) =}")

docs = [Document(page_content=t) for t in texts]

chat = ChatOpenAI(model_name="gpt-3.5-turbo")
chain = load_qa_chain(
    llm=chat,
    chain_type="map_rerank",
    verbose=True,
)

question = "YouTubeを学習したChatGPTを実装するために、インストールが必要なライブラリを教えて。"

output = chain(
    {
        "input_documents": docs,
        "question": question,
    },
    return_only_outputs=True,
)["output_text"]
print(output)

print(f"{time.time() - start_time}")


### 3. refine

In [None]:
from langchain.chat_models import ChatOpenAI

from langchain.document_loaders import YoutubeLoader

from langchain.text_splitter import CharacterTextSplitter

from langchain.docstore.document import Document
from langchain.chains.question_answering import load_qa_chain

import time
start_time = time.time()

youtube_url = "https://www.youtube.com/watch?v=TQvaocfmvaI" # YouTubeを学習したChatGPTの実装方法【Python / LangChain / YouTube】
loader = YoutubeLoader.from_youtube_url(youtube_url, language="ja")
transcript_text = loader.load()[0].page_content
print(f"{transcript_text = }")
print(f"{len(transcript_text) = }")

text_splitter = CharacterTextSplitter(separator=" ", chunk_size=500)
texts = text_splitter.split_text(transcript_text)

print(f"{len(texts) =}")

docs = [Document(page_content=t) for t in texts]

chat = ChatOpenAI(model_name="gpt-3.5-turbo")
chain = load_qa_chain(
    llm=chat,
    chain_type="refine",
    verbose=True,
)

question = "YouTubeを学習したChatGPTを実装するために、インストールが必要なライブラリを教えて。"

output = chain(
    {
        "input_documents": docs,
        "question": question,
    },
    return_only_outputs=True,
)["output_text"]
print(output)

print(f"{time.time() - start_time}")


In [None]:
from langchain.schema import HumanMessage

print(chat([HumanMessage(
  content=f"""
  次の文章を和訳して。
  {output}
""")
]).content)

### ウェブアプリ開発の方法

[<img src="https://img.youtube.com/vi/Cod-3ymwvsQ/maxresdefault.jpg" width="600px">](https://youtu.be/Cod-3ymwvsQ)

[【Python x LangChain x Streamlit x OpenAI API】ChatGPTのウェブアプリ開発入門](https://youtu.be/Cod-3ymwvsQ)