### GitHubから.ipynbドキュメントを抽出する処理

🌐 対象：特定リポジトリのNotebookファイル

GIT_REPOS = [
    ("https://github.com/langchain-ai/langchain", "master", "langchain"),
    # 他リポジトリも追加可能
]

⸻

🛠 GitLoaderの活用
	•	GitHubリポジトリをローカルにクローン
	•	file_filterで**.ipynbファイルのみに絞り込み**

loader = GitLoader(
    repo_path="/tmp/langchain",
    clone_url=repo_url,
    branch=branch,
    file_filter=lambda f: f.endswith(".ipynb")
)

⸻

### ロード処理
	•	指定リポジトリからNotebookを取得
	•	複数のリポジトリに対応可能
	•	統合して1つのドキュメントリストに

docs = loader.load()



⸻

 結果：Notebookの一覧が docs に格納
 

⸻

必要に応じて、対象ファイル数の出力ログや、複数リポジトリ対応の設計についても補足可能です。図解が必要であればその作成もできます！

In [1]:
import re
import json
import tiktoken
from pathlib import Path
from typing import List
from langchain_core.documents import Document
from langchain_community.document_loaders import GitLoader

# === GitHubリポジトリから.ipynbを抽出 ===
GIT_REPOS = [
    ("https://github.com/langchain-ai/langchain", "master", "langchain"),
    #("https://github.com/langchain-ai/langgraph", "main", "langgraph")
]


def load_ipynb_documents():
    all_docs = []
    for repo_url, branch, _ in GIT_REPOS:
        repo_name = repo_url.split("/")[-1]
        loader = GitLoader(
            repo_path=f"/tmp/{repo_name}",
            clone_url=repo_url,
            branch=branch,
            file_filter=lambda f: f.endswith(".ipynb")
        )
        docs = loader.load()
        print(f"Loaded {len(docs)} documents from {repo_url}")
        all_docs.extend(docs)
    return all_docs

docs = load_ipynb_documents()


Loaded 1202 documents from https://github.com/langchain-ai/langchain




### Notebookから高品質なチャンクを抽出する処理

🧼 ステップ1：Base64画像の除去
	•	Markdown形式とHTML形式のdata:imageを含むbase64画像タグを正規表現で除去。
	•	対象：Markdownセルに含まれる冗長な画像データ。

remove_base64_images()



⸻

### ステップ2：NotebookをJSONから構造化
	•	各ドキュメントをnbformatでパースし、セル単位で分解。

⸻

### ステップ3：有用なチャンクだけを抽出
	•	Markdownセル
	•	Base64画像を除去
	•	タイトルだけの行などを除外
	•	意味のある50文字以上の内容だけを抽出
	•	Codeセル
	•	Python構文チェック（compile()）でエラー除外
	•	3行以上のコードだけを抽出

⸻

結果：分析・学習に適したテキストチャンクを自動生成


In [2]:
import re

def remove_base64_images(text: str) -> str:
    # Markdown形式のbase64画像
    text = re.sub(r'!\[.*?\]\(data:image\/[a-zA-Z]+;base64,[^\)]*\)', '', text)
    # HTML形式のbase64画像
    text = re.sub(r'<img[^>]*src="data:image\/[a-zA-Z]+;base64,[^"]*"[^>]*>', '', text)
    return text

# === Notebookを構造化されたチャンクに変換 ===
def extract_useful_chunks_from_docs(docs: List[Document]) -> List[str]:
    import nbformat

    extracted_chunks = []
    for doc in docs:
        try:
            if not doc.page_content.strip().startswith("{"):
                raise ValueError("Not JSON format")
            nb_json = json.loads(doc.page_content)
            nb = nbformat.from_dict(nb_json)
        except Exception as e:
            print(f"[ERROR] Notebook parse failed: {e}")
            continue

        for cell in nb.cells:
            source = cell.get("source", "")
            if isinstance(source, list):
                source = "".join(source)
            source = source.strip()

            if cell.cell_type == 'markdown':
                source = remove_base64_images(source)
                if len(source) > 50 and not re.fullmatch(r"(#+ .+(\n)?)+", source.strip()):
                    extracted_chunks.append(source)

            elif cell.cell_type == 'code':
                try:
                    tree = compile(source, '<string>', 'exec', flags=0, dont_inherit=True)
                    if source.count("\n") >= 3:
                        extracted_chunks.append(source)
                except Exception:
                    continue

    return extracted_chunks
extracted_chunks = extract_useful_chunks_from_docs(docs)




以下は、提示されたコードをプレゼン資料用にわかりやすく要約した内容です：

⸻

🧩 チャンクを構造付きデータに変換する処理

⸻

🎯 目的：抽出済みチャンクを「セクション構造」付きに整理

⸻

🏷 セクション情報を抽出
	•	Markdownの見出し（##や###）を検出し、以下を分類：
	•	##: Section
	•	###: Subsection

header_match = re.match(r'^(#{2,3}) (.+)', chunk.strip())



⸻

📚 内容の蓄積とフラッシュ
	•	見出し以外のテキストをバッファに溜めていき…
	•	新しい見出しが来たら flush_buffer() で保存！

def flush_buffer():
    # チャンクを結合・トークン数をカウント



⸻

🔢 GPTモデル向けのトークン数も計算
	•	tiktoken を使って、各構造チャンクのトークン数を事前評価
	•	モデルに与える前のフィルタリングや制御にも活用可能

⸻

✅ 出力：構造化チャンクのリスト（セクション／小見出し／内容／トークン数）

{
  "section": "Getting Started",
  "subsection": "Installation",
  "tokens": 187,
  "content": "..."
}



⸻

📊 結果確認

print(f"Total structured chunks: {len(structured)}")



⸻

補足が必要であれば、「LangChainのチャンク構造サンプル」や「フロー図（チャンク→構造→RAG用コーパス）」などもご用意可能です！

In [3]:

# === チャンクを構造付きに変換 ===
def structure_chunks(extracted_chunks: List[str]) -> List[dict]:
    structured_chunks = []
    current_section = None
    current_subsection = None
    current_buffer = []
    encoding = tiktoken.encoding_for_model("gpt-4o")

    def flush_buffer():
        if not current_buffer:
            return
        combined = "\n\n".join(current_buffer).strip()
        tokens = len(encoding.encode(combined, disallowed_special=()))
        structured_chunks.append({
            "section": current_section,
            "subsection": current_subsection,
            "tokens": tokens,
            "content": combined
        })
        current_buffer.clear()

    for chunk in extracted_chunks:
        header_match = re.match(r'^(#{2,3}) (.+)', chunk.strip())
        if header_match:
            level = len(header_match.group(1))
            title = header_match.group(2).strip()
            flush_buffer()
            if level == 2:
                current_section = title
                current_subsection = None
            elif level == 3:
                current_subsection = title
            continue
        current_buffer.append(chunk)
    flush_buffer()
    return structured_chunks
structured = structure_chunks(extracted_chunks)
# === 要素数の確認 ===
print(f"Total structured chunks: {len(structured)}")



Total structured chunks: 2644


In [None]:
import json
from typing import List
from concurrent.futures import ThreadPoolExecutor, as_completed
from langchain_community.llms.ollama import Ollama

# === 3台分のモデルを用意 ===
ollama_models = [
    Ollama(model="gemma3:4b"),
    Ollama(model="gemma3_1:4b"),
    Ollama(model="gemma3_2:4b"),
]

# === 補完関数（モデルを指定） ===
def generate_output_with_model(instruction: str, content: str, model_index: int) -> str:
    llm = ollama_models[model_index]
    system_prompt = "あなたはLangChainやVertexAIなどに詳しいLLMです。テクニカルなドキュメントを、分かりやすく要約してください。"
    response = llm.invoke([
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": f"{instruction}\n\n{content}"}
    ])

# === ファイル保存（追記モード） ===
def append_to_jsonl(data: List[dict], path: str):
    with open(path, "a", encoding="utf-8") as f:
        for entry in data:
            f.write(json.dumps(entry, ensure_ascii=False) + "\n")

# === 単一チャンク処理 ===
def process_chunk(i, item):
    section = f"section: {item.get('section')}" if item.get("section") else ""
    subsection = f"subsection: {item.get('subsection')}" if item.get("subsection") else ""
    topic = f"{section} {subsection} {item['content'].split('\n')[0]}"
    instruction = f"{topic}の内容をわかりやすく要約してください。"
    input_text = item["content"]
    model_index = i % len(ollama_models)
    try:
        output_text = generate_output_with_model(instruction, input_text, model_index)
        print(f"[{i+1}] 補完成功（model {model_index}）：{topic}")
    except Exception as e:
        output_text = ""
        print(f"[{i+1}] 補完失敗（model {model_index}）：{topic} → {e}")
    return {
        "instruction": instruction,
        "input": input_text,
        "output": output_text
    }

# === 並列実行（50件＋逐次セーブ）===
def convert_and_save_ft_examples(chunks: List[dict], path: str, batch_size: int = 5):
    examples = []
    save_path = path
    total = min(50, len(chunks))  # 最大50件

    with ThreadPoolExecutor(max_workers=3) as executor:
        futures = {executor.submit(process_chunk, i, item): i for i, item in enumerate(chunks[:total])}
        for count, future in enumerate(as_completed(futures), 1):
            example = future.result()
            examples.append(example)

            if count % batch_size == 0 or count == total:
                append_to_jsonl(examples, save_path)
                print(f"📝 {len(examples)}件を一時保存（{save_path}）")
                examples = []

# === 実行 ===
convert_and_save_ft_examples(structured, "fine_tune_data_with_output.jsonl")

  Ollama(model="gemma3:4b"),
Failed to multipart ingest runs: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')trace=e04b19b6-a9e7-422f-8fd0-7466e1b4937a,id=e04b19b6-a9e7-422f-8fd0-7466e1b4937a
Failed to multipart ingest runs: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')trace=1b3c9bc9-55f4-4d6e-948f-0b96528da3c8,id=1b3c9bc9-55f4-4d6e-948f-0b96528da3c8
Failed to multipart ingest runs: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 C

[1] 補完成功（model 0）：  Go to the VertexAI Model Garden on Google Cloud [console](https://pantheon.corp.google.com/vertex-ai/publishers/google/model-garden/335), and deploy the desired version of Gemma to VertexAI. It will take a few minutes, and after the endpoint is ready, you need to copy its number.


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')
Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[2] 補完成功（model 1）：section: LLM  # Local


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[4] 補完成功（model 0）：section: Query a SQL Database  # Prompt


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[5] 補完成功（model 1）：section: Data Loading  from langchain_text_splitters import CharacterTextSplitter


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[6] 補完成功（model 2）：section: Multi-vector retriever  from langchain_core.output_parsers import StrOutputParser
📝 5件を一時保存（fine_tune_data_with_output.jsonl）


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')
Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[3] 補完成功（model 2）：section: DB  from langchain_community.utilities import SQLDatabase


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[8] 補完成功（model 1）：section: Multi-vector retriever subsection: Add to vectorstore import uuid


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[7] 補完成功（model 0）：section: Multi-vector retriever subsection: Image summaries import base64


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[11] 補完成功（model 1）：section: RAG subsection: Sanity Check ... here is the corresponding summary, which we embedded and used in similarity search.


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[10] 補完成功（model 0）：section: RAG subsection: Check # Check retrieval
📝 5件を一時保存（fine_tune_data_with_output.jsonl）


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[12] 補完成功（model 2）：section: RAG subsection: RAG Here is the trace where we can see what is passed to the LLM:


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[13] 補完成功（model 0）：section: RAG subsection: Considerations This tutorial demonstrates how to implement the Option 2 described [here](https://github.com/langchain-ai/langchain/blob/master/cookbook/Multi_modal_RAG.ipynb) with Generative API on Google Cloud.


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[14] 補完成功（model 1）：section: Setup  We use a zip file with a sub-set of the extracted images and pdf from [this](https://cloudedjudgement.substack.com/p/clouded-judgement-111023) blog post. If you want to follow the full flow, please, use the original [example](https://github.com/langchain-ai/langchain/blob/master/cookbook/Multi_modal_RAG.ipynb).


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')
Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[15] 補完成功（model 2）：section: Setup subsection: Docs import matplotlib.pyplot as plt


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[16] 補完成功（model 0）：section: Models  from langchain_openai import OpenAIEmbeddings
📝 5件を一時保存（fine_tune_data_with_output.jsonl）


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[18] 補完成功（model 2）：section: Semi-structured RAG  The PDF partitioning used by Unstructured will use: 


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[9] 補完成功（model 2）：section: RAG  import io


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[19] 補完成功（model 0）：section: Data Loading  from typing import Any


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[17] 補完成功（model 1）：section: Models subsection: Tree Constrution from typing import Dict, List, Optional, Tuple


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[20] 補完成功（model 1）：section: Multi-vector retriever  We create a simple summarize chain for each element.
📝 5件を一時保存（fine_tune_data_with_output.jsonl）


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[22] 補完成功（model 0）：section: RAG  from langchain_core.runnables import RunnablePassthrough


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[21] 補完成功（model 2）：section: Multi-vector retriever subsection: Add to vectorstore import uuid


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[25] 補完成功（model 0）：section: Multi-vector retriever subsection: Images Note: 


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[24] 補完成功（model 2）：section: Multi-vector retriever  # Prompt


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[27] 補完成功（model 2）：section: Multi-vector retriever subsection: Sanity Check retrieval Here is our retrieval of that table from the natural language query:
📝 5件を一時保存（fine_tune_data_with_output.jsonl）


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[28] 補完成功（model 0）：section: RAG  from langchain_core.runnables import RunnablePassthrough


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[26] 補完成功（model 1）：section: Multi-vector retriever subsection: Add to vectorstore import uuid


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[30] 補完成功（model 2）：section: Multi-vector retriever  # Prompt
[29] 補完成功（model 1）：section: Data Loading  from typing import Any


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[32] 補完成功（model 1）：section: Multi-vector retriever subsection: Add to vectorstore import uuid
📝 5件を一時保存（fine_tune_data_with_output.jsonl）


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[31] 補完成功（model 0）：section: Multi-vector retriever subsection: Images import glob


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[23] 補完成功（model 1）：section: Data Loading  from typing import Any


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[34] 補完成功（model 0）：section: Data Loading  # Path


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[35] 補完成功（model 1）：section: Data Loading subsection: Option 2: Multi-vector retriever from langchain_core.output_parsers import StrOutputParser


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')


[37] 補完成功（model 0）：section: Data Loading subsection: Option 2b: Multi-vector retriever w/ image summaries # The vectorstore to use to index the summaries
[33] 補完成功（model 2）：section: RAG  from langchain_core.runnables import RunnablePassthrough
📝 5件を一時保存（fine_tune_data_with_output.jsonl）


Failed to send compressed multipart ingest: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"error":"Unauthorized: Using outdated v1 api key. Please use v2 api key."}\n')
