typhoon-api: https://docs.opentyphoon.ai/en/

# Installation

In [1]:
# %pip install llama-index llama-index-vector-stores-pinecone llama-index-llms-openai-like
# %pip install llama-index-embeddings-huggingface
# %pip install xformers
# %pip install dependency -U
# %pip install datasets einop

In [2]:
# %pip install transformers -U

In [3]:
from dotenv import load_dotenv
import os
load_dotenv()

True

## Getting Started

### Get your API Key (Typhoon): https://docs.opentyphoon.ai/en/quickstart/

In [4]:
from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
    model="typhoon-v2-70b-instruct",
    api_base="https://api.opentyphoon.ai/v1",
    api_key=os.getenv("TYPHOON_API_BASE"),
    context_window=8192,
    is_chat_model=True,
    is_function_calling_model=False,
)

response = llm.complete("Hello World!")
print(str(response))

Hello! How can I help you today?


### Get Pinecone Vector DB

-  Pinecone: https://www.pinecone.io, https://docs.pinecone.io/guides/index-data/create-an-index
- LlmaIndex: https://docs.llamaindex.ai/en/stable/examples/vector_stores/PineconeIndexDemo/

#### Download Data:

In [5]:
# !mkdir -p 'data/paul_graham/'
# !wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

In [8]:
from llama_index.core import SimpleDirectoryReader

# Load the document from the specified directory
documents = SimpleDirectoryReader("data/paul_graham/").load_data()

### Settings

**namespace understanding**: https://docs.pinecone.io/guides/index-data/indexing-overview#namespaces

In [9]:
from pinecone import Pinecone

from llama_index.core import Settings
from llama_index.core import StorageContext
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core import VectorStoreIndex

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

  warn(


In [10]:
Settings.llm = OpenAILike(
    model="typhoon-v2-70b-instruct",
    api_base="https://api.opentyphoon.ai/v1",
    api_key=os.getenv("TYPHOON_API_BASE"),
    context_window=8192,
    is_chat_model=True,
    is_function_calling_model=False,
)

In [11]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# 3) Wrap in the LlamaIndex embedding interface: https://docs.llamaindex.ai/en/stable/examples/embeddings/huggingface/
embed_model = HuggingFaceEmbedding(
    model_name="Qwen/Qwen3-Embedding-0.6B",
    trust_remote_code=True,
    device="cuda",   
    # embed_batch_size=1,   # keep these small if you’re low on VRAM
)

Settings.embed_model = embed_model

In [12]:
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
pinecone_index = pc.Index("quickstart")

**namespace: general**

In [13]:
vector_store = PineconeVectorStore(
    pinecone_index=pinecone_index,
    namespace="general",
    )
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [14]:
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

Upserted vectors:   0%|          | 0/22 [00:00<?, ?it/s]

2.86 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


**namespace: non_general**

In [15]:
vector_store = PineconeVectorStore(
    pinecone_index=pinecone_index,
    namespace="non_general",
    )
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [16]:
%%timeit -r 1 -n 1
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

Upserted vectors:   0%|          | 0/22 [00:00<?, ?it/s]

2 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


### Query Index: Without llama-index

In [34]:
import torch
from transformers import AutoConfig, AutoTokenizer, AutoModel
from pinecone import Pinecone
from typing import Union, List

from functools import cache
import json

In [53]:
# Config  

PINECONE_INDEX_NAME = "quickstart"
MODEL_NAME          = "Qwen/Qwen3-Embedding-0.6B"
MAX_LEN             =  32768

In [92]:
@cache
def _device() -> torch.device:
    if torch.cuda.is_available():
        return torch.device("cuda")
    if torch.backends.mps.is_available():
        return torch.device("mps")
    return torch.device("cpu")

@cache
def _tokenizer():
    return AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)

@cache
def _model():
    config = AutoConfig.from_pretrained(MODEL_NAME, trust_remote_code=True)
    # NOTE: Removed the unsupported add_pooling_layer argument here
    model = AutoModel.from_pretrained(
        MODEL_NAME,
        config=config,
        trust_remote_code=True,
    ).to(_device()).eval()
    return model

@cache
def _index():
    pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
    return pc.Index(PINECONE_INDEX_NAME)

def list_namespaces() -> List[str]:
    """
    Returns a list of all namespaces in the Pinecone index.
    """
    stats = _index().describe_index_stats()
    return list(stats["namespaces"].keys())

def _avg_pool(last_hidden: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
    """
    Mean-pool the last hidden states over the token dimension,
    ignoring padding (mask=0).
    """
    masked = last_hidden.masked_fill(~mask[..., None].bool(), 0.0)
    return masked.sum(dim=1) / mask.sum(dim=1)[..., None]

def embed_query(text: str) -> List[float]:
    """
    Tokenize the query, run it through the model, average-pool
    the hidden states, normalize, and return a Python list.
    """
    tok = _tokenizer()(
        "query: " + text,
        return_tensors="pt",
        truncation=True,
        max_length=MAX_LEN,
    ).to(_device())

    with torch.no_grad():
        hidden = _model()(**tok)[0]           # last_hidden_state
        emb = _avg_pool(hidden, tok.attention_mask)
        emb = torch.nn.functional.normalize(emb, p=2, dim=1)

    return emb[0].cpu().tolist()

def show(results: dict):
    """
    Print the cosine score, namespace, and the node text 
    (extracted from _node_content → text) for each match.
    """
    for m in results.get("matches", []):
        score = m.get("score", 0.0)
        namespace = m.get("namespace", "N/A")
        raw_node = m["metadata"].get("_node_content", "")
        
        # Parse the JSON blob to extract the "text" field
        try:
            node_obj = json.loads(raw_node)
            content = node_obj.get("text", "").strip()
        except json.JSONDecodeError:
            content = raw_node

        print(f"Score: {score:.3f} | Namespace: {namespace}")
        print(content)
        print("#"*8)


def extract_boxed(text: str, self_reflect: bool = False) -> list[str] | None:
    """
    Extracts a list of namespaces from a string of the form \boxed{…}.
    """
    m = re.search(r"\\boxed\{(.+?)\}", text)
    if not m:
        return None
    content = m.group(1)
    
    # Extract the content after using self-reflection (string)
    if self_reflect:
        return content

    try:
        return ast.literal_eval(content)
    except Exception:
        return [x.strip() for x in content.split(",") if x.strip()]

In [32]:
def query_pinecone(query: str, top_k: int = 5, ns: Union[str, List[str]] = "default"):
    """
    Query Pinecone across one or multiple namespaces using a dense embedding.

    Args:
        query (str): Text query to embed and search.
        top_k (int): Number of top results to return.
        ns (Union[str, List[str]]): One or more namespaces to search.
                                    Use "default" to query all available.

    Returns:
        dict: Pinecone query results with metadata.
    """
    # Normalize namespaces
    if ns == "default":
        namespaces = list_namespaces()
    else:
        namespaces = [ns] if isinstance(ns, str) else ns

    return _index().query_namespaces(
        vector=embed_query(query),
        top_k=top_k,
        namespaces=namespaces,
        metric="cosine",
        include_metadata=True,
        show_progress=True,
    )

In [25]:
results = query_pinecone("What did the author do growing up?", top_k=3, ns=["general"])

In [38]:
# results

In [37]:
show(results)

Score: 0.436
But it didn't feel very valuable to me; I had no idea how to value a business, but I was all too keenly aware of the near-death experiences we seemed to have every few months. Nor had I changed my grad student lifestyle significantly since we started. So when Yahoo bought us it felt like going from rags to riches. Since we were going to California, I bought a car, a yellow 1998 VW GTI. I remember thinking that its leather seats alone were by far the most luxurious thing I owned.

The next year, from the summer of 1998 to the summer of 1999, must have been the least productive of my life. I didn't realize it at the time, but I was worn out from the effort and stress of running Viaweb. For a while after I got to California I tried to continue my usual m.o. of programming till 3 in the morning, but fatigue combined with Yahoo's prematurely aged culture and grim cube farm in Santa Clara gradually dragged me down. After a few months it felt disconcertingly like working at Inter

In [59]:
from openai import OpenAI
import re
import ast
from collections import Counter
import os

In [73]:
client = OpenAI(
    api_key=os.getenv("TYPHOON_API_BASE"),
    base_url="https://api.opentyphoon.ai/v1"
)

In [74]:
# Make a completion request
response = client.chat.completions.create(
    model="typhoon-v2-70b-instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello, world!!!"}
    ],
    max_tokens=512,
    temperature=0.6
)

In [75]:
print(response.choices[0].message.content)

Hello! How can I help you today?


In [81]:
def extract_boxed(text: str) -> list[str] | None:
    """
    Extract a list of namespaces from:
      - \boxed{…} or \boxed[…]
      - bare bracket literals like [a, b]
    Returns None if nothing matches.
    """
    # matching \boxed{…} or \boxed[…]
    m = re.search(r"\\boxed(?:\{|\[)(.+?)(?:\}|\])", text)
    if m:
        content = m.group(1)

    else:
        stripped = text.strip()
        if stripped.startswith("[") and stripped.endswith("]"):
            content = stripped[1:-1]
        else:
            return None

    try:
        return ast.literal_eval(f"[{content}]" if not content.strip().startswith(("[", "{")) else content)
    except Exception:
        parts = [p.strip().strip("'\"") for p in content.split(",") if p.strip()]
        return parts or None

def choose_namespaces(
    question: str,
    available: list[str],
    votes: int = 4,
    top_n: int = 2
) -> list[str]:
    system_prompt = (
        "You are a helpful assistant that classifies questions "
        "into topic namespaces. Only select from the provided list."
    )
    choices_str = ", ".join(available)
    ballots: list[str] = []

    # 1) Send the batch of votes
    response = client.chat.completions.create(
        model="typhoon-v2-70b-instruct",
        messages=[
            {"role": "system",  "content": system_prompt},
            {
                "role": "user",
                "content": (
                    f"Q: {question}\n\n"
                    f"Available namespaces: {choices_str}\n\n"
                    "Choose only the most relevant namespaces, "
                    "and return the result in \\boxed[]."
                ),
            },
        ],
        max_tokens=128,
        temperature=0.6,
        n=votes
    )

    # 2) Show each vote and what it parsed
    print(f"\n=== Voting results ({votes} ballots) ===")
    for i, choice in enumerate(response.choices, 1):
        text = choice.message.content.strip()
        ns = extract_boxed(text) or []
        print(f"Vote {i}: raw → {repr(text)}")
        print(f"         parsed → {ns}")
        ballots.extend(ns)

    if not ballots:
        print("No namespaces extracted; falling back to ['default'].")
        return ["default"]

    # 3) Tally and show counts
    counts = Counter(ballots)
    print("\n=== Tally of all ballots ===")
    for ns, cnt in counts.items():
        print(f"  {ns}: {cnt} vote{'s' if cnt>1 else ''}")

    # 4) Select top_n and filter by available
    selected = [ns for ns, _ in counts.most_common(top_n)]
    valid = [ns for ns in selected if ns in available]
    final = valid or ["default"]
    print(f"\nSelected top {top_n}: {final}\n")

    return final

In [82]:
Query = "What did the author do growing up?"

In [83]:
available_namespaces = ["general", "non_general"]

In [87]:
selected = choose_namespaces(
    Query, 
    available_namespaces,
)


=== Voting results (4 ballots) ===
Vote 1: raw → '\\boxed[non_general]'
         parsed → ['non_general']
Vote 2: raw → '\\boxed[general]'
         parsed → ['general']
Vote 3: raw → '\\boxed[non_general]'
         parsed → ['non_general']
Vote 4: raw → '\\boxed[general]'
         parsed → ['general']

=== Tally of all ballots ===
  non_general: 2 votes
  general: 2 votes

Selected top 2: ['non_general', 'general']



In [88]:
selected

['non_general', 'general']

In [96]:
results = query_pinecone(Query, top_k=1, ns=selected)

In [97]:
show(results)

Score: 0.437 | Namespace: non_general
But it didn't feel very valuable to me; I had no idea how to value a business, but I was all too keenly aware of the near-death experiences we seemed to have every few months. Nor had I changed my grad student lifestyle significantly since we started. So when Yahoo bought us it felt like going from rags to riches. Since we were going to California, I bought a car, a yellow 1998 VW GTI. I remember thinking that its leather seats alone were by far the most luxurious thing I owned.

The next year, from the summer of 1998 to the summer of 1999, must have been the least productive of my life. I didn't realize it at the time, but I was worn out from the effort and stress of running Viaweb. For a while after I got to California I tried to continue my usual m.o. of programming till 3 in the morning, but fatigue combined with Yahoo's prematurely aged culture and grim cube farm in Santa Clara gradually dragged me down. After a few months it felt disconcertin

In [99]:
matches = results.get("matches", [])

texts = []
for m in matches:
    raw_node = m["metadata"].get("_node_content", "")
    node_obj = json.loads(raw_node)
    texts.append(node_obj.get("text", "").strip())
    
aggregated_context = "\n\n".join(texts)
aggregated_context

"But it didn't feel very valuable to me; I had no idea how to value a business, but I was all too keenly aware of the near-death experiences we seemed to have every few months. Nor had I changed my grad student lifestyle significantly since we started. So when Yahoo bought us it felt like going from rags to riches. Since we were going to California, I bought a car, a yellow 1998 VW GTI. I remember thinking that its leather seats alone were by far the most luxurious thing I owned.\n\nThe next year, from the summer of 1998 to the summer of 1999, must have been the least productive of my life. I didn't realize it at the time, but I was worn out from the effort and stress of running Viaweb. For a while after I got to California I tried to continue my usual m.o. of programming till 3 in the morning, but fatigue combined with Yahoo's prematurely aged culture and grim cube farm in Santa Clara gradually dragged me down. After a few months it felt disconcertingly like working at Interleaf.\n\nY

In [109]:
Query = "What did the author do growing up?"

final_prompt = f"You are a helpful assistant.\n\n### Context: {aggregated_context}"

In [116]:
# Make a completion request
response = client.chat.completions.create(
    model="typhoon-v2-70b-instruct",
    messages=[
        {"role": "system", "content": final_prompt},
        {"role": "user", "content": Query}
    ],
    max_tokens=2048,
    temperature=0.6
)

In [117]:
print(response.choices[0].message.content)

The text does not provide specific details about the author's childhood or upbringing. However, it mentions that the author was a graduate student before starting the company Viaweb, suggesting that the author pursued higher education prior to becoming an entrepreneur.


# See More: https://github.com/DoTA-RAG/dota-rag