<a href="https://colab.research.google.com/github/maninog/langchain/blob/main/LangChain_Indexes_Webpage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LangChain Indexes（Webpage）

## README
- author: [Masumi Morishige](https://twitter.com/masumi_creator)
- created_at: 2023-04-23
- updated_at: 2023-06-29

### 実行方法
1. OpenAIのAPIキーを発行
2. `os.environ["OPENAI_API_KEY"] = "..."`の`""`の中にご自身のAPIキーを代入
3. 「ランタイム > すべてのセルを実行」を実行

### 参考情報
- Zenn: [制作中]
- YouTube: [ウェブページをChatGPTに学習させる方法【Python / LangChain / FAQ】](https://youtu.be/qdvgpoVqfzs)

### OpenAI APIの発行方法

[<img src="https://img.youtube.com/vi/frpsKLNW1q4/maxresdefault.jpg" width="600px">](https://youtu.be/frpsKLNW1q4)

[【エンジニア向け】OpenAIのAPI連携方法【環境構築 + GASによるGoogle Documentへの組み込み】](https://youtu.be/frpsKLNW1q4)

## 環境構築

In [None]:
!pip install langchain==0.0.145

In [None]:
!pip install openai==0.27.8

In [None]:
import os

#TODO: APIキーの登録が必要
os.environ["OPENAI_API_KEY"] = "..."

In [None]:
!pip install chromadb==0.3.26

In [None]:
!pip install tiktoken==0.4.0

In [None]:
!pip install unstructured==0.7.10

## 実装方法

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter

from langchain.document_loaders import UnstructuredURLLoader

urls = [
    "https://zenn.dev/umi_mori/articles/what-is-gpt-4", # GPT-4とは？【概要と使い方 / GPT-3.5と比較 / ChatGPTでの使用方法】
    "https://zenn.dev/umi_mori/articles/chatgpt-api-python", # ChatGPT APIの「概要と使い方」（Pythonコード付き）
    "https://zenn.dev/umi_mori/articles/chatgpt-google-chrome-plugins", # ChatGPTの便利プラグイン7選【Google Chrome拡張機能】
]

loader = UnstructuredURLLoader(urls=urls)
print(loader.load())

text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 300,
    chunk_overlap = 0,
    length_function = len,
)

index = VectorstoreIndexCreator(
    vectorstore_cls=Chroma, # Default
    embedding=OpenAIEmbeddings(), # Default
    text_splitter=text_splitter,
).from_loaders([loader])

query = "7番目に紹介しているChatGPT便利プラグインは？"

answer = index.query(query)
print(answer)

In [None]:
!pip install nest_asyncio==1.5.6

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.indexes import VectorstoreIndexCreator
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import UnstructuredURLLoader
from langchain.document_loaders.sitemap import SitemapLoader

import nest_asyncio

nest_asyncio.apply()

loader = SitemapLoader(web_path="https://langchain.readthedocs.io/sitemap.xml")

print(loader.load())

text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 300,
    chunk_overlap = 0,
    length_function = len,
)

index = VectorstoreIndexCreator(
    vectorstore_cls=Chroma, # Default
    embedding=OpenAIEmbeddings(), # Default
    text_splitter=text_splitter,
).from_loaders([loader])

query = "LangChainとはなんですか？"

answer = index.query(query)
print(answer)

### ウェブアプリ開発の方法

[<img src="https://img.youtube.com/vi/Cod-3ymwvsQ/maxresdefault.jpg" width="600px">](https://youtu.be/Cod-3ymwvsQ)

[【Python x LangChain x Streamlit x OpenAI API】ChatGPTのウェブアプリ開発入門](https://youtu.be/Cod-3ymwvsQ)